Blogless: Blog of Design Less Better.

BrowseRank

Microsoft's new search algorithm returns more relevant search results by focusing on a page's "stickiness" as opposed to its incoming links.

Microsoft Research just published a paper revealing a new type of web search ranking — BrowseRank [pdf] — as revealed at last week’s SIGIR (Special Interest Group on Information Retrieval) conference. (Thanks for the heads-up James).

The gist of the proposal is that search results are ranked by how long users tend to stay on a single page vs. the amount of incoming links a page has (i.e. PageRank).

Top 20 websites from different search algorithms.
The top-20 websites ranked by using different algorithms.

From the results shown in Table 3 (above), Microsoft infers the following:

First, BrowseRank tends to give high ranks to Web 2.0 websites (marked in bold)…The reason is that web users visit the websites with high frequencies and often spend much time on them, even if the websites do not have as many inlinks as Web 1.0 websites like adobe.com and apple.com do. Note this reflects users’ real information needs.

Second, some websites like adobe.com are ranked very high (sic) by PageRank. One reason is that adobe.com has millions of inlinks for Acrobat Reader and Flash Player downloads. However, web users do not really visit such websites very frequently and they should not be regarded more important than the websites on which users spend much more time…

Third, the ranking results produced by TrustRank are similar to PageRank…

In summary, the ranking results given by BrowseRank seem to better represent users’ preferences than PageRank and TrustRank.”

Personally, I find the results of this study incredibly interesting, and I feel like the Microsoft Researchers (a) are on the right track, and (b) have addressed some of the real, significant blindspots of the PageRank model. Also, as the purveyor of what at least attempts to be a blog of substantive content I appreciate that the relevance of my website might be measured in a way that’s not so deeply easy to spam.

GlueRank

However, it immediately occurred to me that the Microsoft proposal doesn’t take it far enough. I think it’s very clear that in addition to the length of time spent on the current page, the length of time spent on the current page’s domain (e.g. the total amount of time you spend reading some captivating blog as opposed to just some single blog post) should be considered.

It could probably get even more complicated from there, but the point is this: The lesson we learn from BrowseRank is that a page’s "stickiness" is a more reliable indicator of quality content than how many incoming links it has. From there, it’s just a matter of extrapolation.

These icons link to social bookmarking sites where readers can share and discover new web pages.
PaulAug 4, 2008
 

Comments on this post

1.

Very interesting, although this has been tried before. DirectHit had a search engine built entirely on clickstream data (Acquired by Ask.com in 2000). They got the data from ISPs in those days. The end-result is really not that much better than Page-Rank.

Me.dium on the other hand (http://me.dium.com/search) is processing user’s clickstream data in real-time to create a different lens based on what’s going on now. e.g. do a search for John Edwards on Google or Live, and you get johnedwards.com and wiki/johnedwards. Do the same search on Me.dium and you learn that today people care about his love child, pictures of his mistress, etc.

The difference is real-time (what people are browsing now) vs. historical (what they browsed in the past). Social vs. Old School. Check it out. http://me.dium.com/search.

Chris at 11:21am on Mon, Aug 4th.

Post a comment

Name
Email
Url
Comment
  Please feel free to use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
Validate
Close this
E-mail It