DevPub
Free Mousepad

How Google Works

How Google WorksA look at the myths and facts about Google, and what you should know about Google's inner workings. Continue Reading

Search Engines and You: Getting Crawled

Search Engines and You: Getting CrawledHow to get your newly launched web site to be indexed by search engines and how directories can help you. Continue Reading

How Google Works

A look at the myths and facts about Google, and a collection of all general information that any webmaster should know before attempting SEO techniques.

On Tuesday, March 18th 2008 at 10:23 AM
By Louis Fernandez (View Profile)
-----   (Rated 0 with 0 votes)
Currently Google is powering around 50% of US search (Google, AOL, Earthlink, Go, Netscape, and many others). More worldwide search statistics are available here.
Google shows up to 10 AdWords ads on their search results, but they keep them separate from the regular (or organic) listings. There is no direct way to pay Google money to list in their organic search results.

So how does Google work?

Google takes an empirical view of the linking structure of the web and rates pages on a logarithmic scale based on what pages link to them. The Google Toolbar provides a 0 to 10 scale to mimic the link popularity of listed sites. This helps you determine how important Google thinks a site is.

Many webmasters hear a statement like this and want to run and exchange as many links with as many people as they can. This is a form of spam which search engines know all too well.

Sometimes these rings gain popularity, but once they are found out they may get penalized and delisted. When you link to into the wrong circles you run the risk of being associated with them. It is important to note that this PageRank value is only one component of the Google search engine algorithm. Many times a PR 4 site will list above a PR 6 site because it was optimized better and has a well defined descriptive inbound link profile (which means better keyword rich links from more sites and more related sites).

Many Myths about Google:

There are many myths about Google that are represented as fact by marketers trying to make money. Misinformation spreads like wildfire because everyone wants to sound like the smart person with all the answers. One example of the many myths about Google is that you are limited to 100 links per page.

Google threw out that guidance based upon usability ideas. On pages with no link popularity they will not want to follow many links. On pages with a large amount of link popularity Google will scour thousands of links. Google’s page indexing limit is 101k, though most pages should be smaller than that from a usability standpoint.

If you ever have questions SearchGuild and V7N.com are two of the most straightforward SEO forums on the web.

What Pages of My Site are Indexed by Google?

You can check to see what pages of your site are indexed by searching Google for site:www.mysite.com mysite.

How do I Submit My Site to Google?

While Google also offers a free site submit option the best way to submit your site is by having Google’s spider follow links from other web pages.

Where do I Rank in Google for My Keywords?

I use the free Digital Point keyword ranking tool to determine where I rank in Google. Tracking various sites helps me determine some of the ways Google may be changing their algorithm. If you sign up for the Google API service and are doing lots of sketchy stuff then it makes it easy for Google to cross connect your websites. The Digital Point keyword ranking tool also supports Yahoo! and MSN.

Google Backlink Check:

Backlinks is another way of saying “links into a page.”

When you check backlinks in Google (link:www.whateversite.com) it only shows a small sampling of your total backlinks. Many links that do not show up when you use the link: function in Google still count for your relevancy scoring. In addition there is a time delay between when links are made and when they will show up in search results.

To get a more accurate picture of links you will also want to check backlinks in Yahoo!. Yahoo! often shows many backlinks that the Google search will not show. The code to check Yahoo! backlinks is linkdomain:www.site.com.

Digital Point has a free tool which will track your Google position by keyword, PageRank, and number of backlinks. Rusty Brick also recently created a free tool which does Google backlink analysis.

How do I know what sites are good?

First off, common sense usually goes pretty far in this category. Secondly, Google has a toolbar which shows how it currently views a web page or website.

The Google toolbar is one of the top search engine optimization tool for a person new to search engine marketing. It works on windows and is downloadable at http://toolbar.google.com/. Finally, if you have doubts you probably do not want to link to the site. You can also feel free to ask me or ask in various SEO forums.

PageRank is a measure of link popularity which can come and go. It’s not hard for a successful business to rent a few high PageRank links into their site and then leverage that link popularity for link exchanges. A site with decent PageRank can get penalized just the same as a site with low PageRank. Usually you will want to error on the side of caution off the start.

If you are using techniques that fall far outside of Google’s recommended guidelines I would not recommend using their toolbar since the feedback the toolbar provides may make it easy for them to link you to all of your websites.

Google Toolbar Broken?

1. Sometimes the Google Toolbar gets stuck at 0 when searching the web. If you are unsure of the PageRank of a page go to a high PageRank site (like http://www.w3c.org) and then type the address of where you were just at in the address bar of internet explorer. Usually this technique will unstick the PageRank. Keep in mind that Google has only been updating toolbar display PageRank quarterly, so if a site is only a few months old it will not be uncommon for it to show a PageRank 0 in the toolbar. The last toolbar PageRank update occurred at 1130 pm on the last day of 2004.

2. Using this toolbar you can see the PageRank of your top competitors and who links to them. You may be able to get links from people who are linking to your competitors. You need to enable the site information button from the options menu. When it is turned on you should see a big blue circle with a white letter I inside of it.

3. To find out who is linking to your competitors you can type link:www.evilcompetitor.com in the Google search box. This will show most of the inbound links to the given page from pages with a PageRank of 4 or greater.

4. The toolbar is just an aid though, and should be combined with common sense. If you see sites linking into awful websites or a site looks bad then do not exchange links with them. If their site is highly unrelated to yours then it might not be a good idea to link from a user experience angle.

5. If you use the Safari browser you can use the PageRank Toolbar Widget For Macintosh to view PageRank. For other browsers try the free extensions offered by Mozdev.

Recent Algorithm Shifts:

In November of 2003 Google performed a major algorithm change. The goal of the change was to make it harder to manipulate their search results. It is believed that Google may have significantly incorporated Hilltop, Topic Specific PageRank, and latent semantic indexing into their algorithm. It seems as though they have since tuned down much of the change that occurred back then.

Over time it will become increasingly important to get links from the right community and not just to perform random link exchange. For example, to a search engine marketer a link from SearchEngineWatch (a search engine information resource hub) may be worth much more than many random off topic links.

I still have seen significant evidence that off topic inbound links can improve your Google rankings significantly, but likely this will eventually change.

In early 2004 Google also began to block the ability of certain sites to sell PageRank. In addition Google seems to have set up a portion of their algorithm to delay the effects of some links or to only allow them to parse partial link credit off the start. These are all moves which are aimed at making manipulating the Google index through link buying a much more expensive and much more unpredictable process.

It may take up to three or so months for the full effect of a link rentals or new links to kick in.

Google Sandbox:

Many new sites or sites which have not been significantly developed have a hard time ranking right away on Google. Many well known SEOs have stated that a good way to get around this problem is to place a site on a subdomain of a developed site and after the site is developed and well indexed 301 redirect the site to the new location.

About PageRank:

PageRank is not everything. PageRank is as much a Google marketing item as it is anything else. By them making the concept easy to see and understand it allows more people to talk about them and makes it easier for more people to explain how search engines work using Google and PageRank as the vocabulary. Google’s technology is not necessarily better / more effective than the technologies owned by Yahoo!, MSN, or Ask Jeeves / Teoma.

Hilltop:

Hilltop is an algorithm which reorganizes search results based on an expert rating system.

In the Hilltop white paper they talk about how they can use expert documents to help compute relevancy. An expert document is a non affiliated page which links to many related resources. If page A is related to page B and page B is related to page C then a connection between A & C are assumed.

Additionally Hilltop states that it strongly considers page title and page headings in relevancy scores (in fact these elements can be considered more important than link text). Likely Hilltop also considers the links pointing into the page and site which your links come from.

The benefit of Hilltop over raw PageRank (Google) is that it is topic sensitive - and is thus generally harder to manipulate than buying some random high power off topic link would be. The benefits of Hilltop over topic distillation (Teoma) are that Hilltop is quicker & cheaper to calculate, and that it tends to have more broad coverage.

When Hilltop does not have enough expert sites the feature can be turned off. It is believed that Google might be using Hilltop to help sort the relevancy for some of their search results.

Topic Sensitive PageRank:

Topic Sensitive PageRank biases both the query and the relevancy of returned documents based upon the perceived topical context of the query.

The query context can be determined based on search history, user defined input (such as search personalization – try Google Labs Search Personalization if you are interested), or related information in the document from which the query came from.

Topic Specific PageRank for each page can be calculated offline. Using an exceptionally coarse topic set (for example, the top level Open Directory Project categories) still allows Topic Sensitive PageRank to significantly enhance relevancy over using PageRank alone, however TSPR can be applied more specifically as well.

Since much of it is calculated offline Topic Specific PageRank can also be rolled into other relevancy algorithms which are calculated in near real time.

I do not think it is exceptionally important for most webmasters to deeply understand TSPR and Hilltop, other than to understand the intent of these algorithms, which is to move away from grading the web on the whole. Evaluating it based upon local topical communities.

Latent Semantic Indexing:

Latent semantic indexing allows machines to understand language by looking at it from a purely mathematical viewpoint.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

Latent semantic indexing is a rather expensive process and many SEO experts debate to what extent major search engines may be using the technology. If they are not using it much yet in time they surely will.

Most webmasters do not need to know much about LSI other than knowing using a variety of inbound anchor text is important, and LSI will inherently rank natural writing better than content which is clumsy and written with keyword density in mind.

Temporal Analysis:

Search engines can track how long things (sites, pages, links) have been in existence and how quickly they change. They can track how long a domain has been in existence, how often page copy changes, how page copy changes, how large a site is, how quickly link popularity builds, how long any particular link exists, how similar the link text is, how a site changes in rank over time, how related linking sites are, and how natural linkage data looks.

In some cases it makes sense for resources to acquire a bunch of linkage data in a burst. When news stories about a topic and search volumes on a particular term are high it would also make sense that some sites may acquire a large amount of linkage data. Most the time if links build naturally they build more slowly and evenly.

If links build in huge spikes then search engines may discount – or even apply a penalty – to the domain receiving that linkage data if those links do not build in a somewhat regular pattern.

Stale pages may also be lowered in relevancy. A page may be considered fresh if it changes somewhat frequently or if it continues to acquire linkage data as time passes.

Google may also look at how often your site is bookmarked, who your advertisers are, and other various feedback they can get from their toolbar.

Google was awarded a patent on March 31, 2005 covering these types of topics (but in much more detail).

While I do not think they are already necessarily doing all the things they mention in the patent I think they eventually may. The patent is interesting and worth reading if you are deeply interested in SEO and information retrieval.


Rate Rate this article

Comment Comment on this article
Name: Email:
Message:

Comment Current Comments