US Patent 6,285,999

There are several fundamental inventions that have shaped the formation of the Internet business models as we know them today. There was the selling of banner ads (credited to HotWired back in 1994), the keyword auction for displaying ads (invented by Goto.com in 1999), and there is Google’s algorithm for ranking search results, also known as PageRank (described in US Patent 6,285,999).

This patent was the foundation for Google, and enabled them to differentiate themselves from other well-established search engines at that time. So, given its significance, I thought it was time that I got around to reading it, which I did this weekend.

Search engines, such as Google’s, “crawl” the web, grabbing copies of all the web pages that they can find, and following the links within them to find more web pages. Then they create an enormous index of all the information within the web pages. So, when you type in some keywords to search for, they look them up in the index, to find all possible matches, and then rank and order those matches such that the most likely ones appear in the first page of results. The PageRank algorithm supplies this ranking.

Essentially their algorithm produces a scaled version of the estimated probability of a web surfer ending up on a given page. If one page is better linked-to than another page (based on the number of links from other well-linked-to pages), it will gain a higher ranking.  They describe how this can be estimated through iteratively multiplying a probability matrix with itself.

As I was reading this, I recalled a discussion that I had back in the late 90s with my then-housemate Brendan. We were discussing a reputation database, where people would recommend others who they respected, based I think on a concept in David Brin’s book Earth. The solution to calculating these reputations was pretty much the same as Google’s method for PageRank. I’m not saying this to big-note myself, just to point out that as neither Brendan nor I had a PhD in database algorithms and since it took us 5 minutes to think up the solution, the algorithm is hardly rocket science.

Since then, Google’s gone on to greatness, and to produce many other patents. Today, PageRank is considered to be just one of hundreds of factors that go into ranking their results. However, it’s interesting to see how a simple invention (and a lot of hard work from talented people!) has been the basis for one of the most respected global companies.

Google AdSense update

It has been a few months since I last wrote about the Google ads on this blog. After a couple of false starts, I’ve been tracking the main pages that have ads people click on. The idea is to see if the most popular pages are also the most profitable pages.

The most popular pages since 1st June have been, in order:

This is very similar to last time, with the investment book reviews replaced with a recipe. Apparently there are more cooks reading, than books cooking. They don’t come here for the puns, that’s for sure.

The analysis of top search terms (courtesy of Google Analytics) is also similar to last time. The top four terms are:

  • “best man speech”
  • “cheesecake recipe”
  • “baklava recipe”
  • “positive gearing”

Variants of these are repeated until position 12 which is “auction strategy”, and it is position 27 before we get to “hreview”. Unfortunately, microformats are still not particularly popular.

But the list you’ve all been waiting for is the most profitable pages (i.e. those where people click on the ads the most). These are:

So, it’s still the popular pages (and, oddly, my About page), and the four most-searched for topics are also profitable, but the order is different. It seems that the message from my advertisers is that I should write more about real estate and baking.

Be careful what you wish for.

Google sends me cheques

GoogleOkay, they’re not very big cheques, but the ads on this site apparently get clicked on by enough people that Google sends me cheques. Okay, only one cheque so far. It was for $120.47 – that’s enough to pay for my hosting fees.

Although, I hear you asking, how can an obscure personal blog get enough visits – let alone clicks – for Google ads to work? This is an obvious question, with a pretty interesting answer, that if I have it right, suggests that web ads are a special case of search ads.

A quick tutorial. Google has two advertising programs: AdSense and AdWords. AdWords enables advertisers to submit ads that are displayed against particular words or search terms (e.g. “negative gearing”). AdSense enables web page authors/publishers to give-up space on their pages for ads to shown. Google matches the AdWords advertisers with AdSense publishers in order to maximise the chance of visitors clicking on the ads.

I am a member of the AdSense program, but it’s really token involvement. You’ll see only a single text-based ad block, capable of showing two ads, on any page. I’m not exactly running an advertising honey-pot, here.

However, apparently certain pages on this site are popular enough to attract a significant amount of traffic. The top pages are (in order):

Now, I haven’t run the stats (yet), but of the above, the only pages with ads that seem likely to generate clicks are the Best Man Speech, Cheesecake Recipe, Positive Gearing Analysis and Investment Book Reviews. These aren’t pages that change at all often, so it’s not my regular readers (!) who are clicking on those ads. I think this is the norm for blogs – it’s not the regular readers generating advertising revenue.

The ad revenue comes from those who arrive on the site via web searches. Near 80% of all visits come from search engines. The top search terms of visitors are:

  • “best man speech”, “best man speeches”, and “bestman speeches”
  • “cheesecake recipe” and “cheesecake recipes”
  • “positive gearing”

Those terms account for a third of all search visitors, and there are several other variants of those. What it strongly suggests is that there are people searching for particular terms, and instead of clicking on the paid ads in the search results they click on a link to one of my pages. And then, once they’ve arrived on one of my pages, they click on a paid ad.

One explanation is that the pages on my blog provide a type of advertising filter. Perhaps the search engine , say Google, is not able to fine-tune the paid ads when all it has to go on is the term “best man speech”, but when Google can utilise all of the words in one of my pages, it does much better. So much so that people click first on a real page, not as a way of getting everything they want, but as a way of giving Google more information on what they’re after.

And a diversity of topics on a blog lends itself to being used by Google in this way. Typically a blog will focus on a single topic and build up a readership that is strongly interested in that topic. But those readers aren’t likely to be clicking on ads anyway. So my atypical approach of rambling wildly about many things doesn’t build up much of a readership, but it does enable Google to use my content to optimise its ads and hence pay me a small commission when people click on them.