aku-aku: v.. To move a tall, flat bottomed object (such as a bookshelf) by swiveling it alternatively on its corners in a "walking" fashion. [After the book by Thor Heyerdahl theorising the statues of Easter Island were moved in this fashion.] source: LangMaker.com. Aku Aku also has another meaning to the islanders: a spiritual guide.
« Timing Your Email Marketing | Main Page | 28 of 52: Coachella 2008 Media Download Script »
More Data vs More Clever Algorithm
Posted by dav at 2008 April 3 04:24 AM
File under: Geek

An interesting post titled More Data Usually Beats Better Algorithms shows how two teams using the different approaches fared in the Netflix Challenge. Here is the gist with a corroborating analysis of Google success:

But the bigger point is, adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set. I'm often suprised that many people in the business, and even in academia, don't realize this.
Another fine illustration of this principle comes from Google. Most people think Google's success is due to their brilliant algorithms, especially PageRank. In reality, the two big innovations ... were:
1. The recognition that hyperlinks were an important measure of popularity -- a link to a webpage counts as a vote for it.
2. The use of anchortext (the text of hyperlinks) in the web index, giving it a weight close to the page title.
First generation search engines had used only the text of the web pages themselves. The addition of these two additional data sets -- hyperlinks and anchortext -- took Google's search to the next level. The PageRank algorithm itself is a minor detail -- any halfway decent algorithm that exploited this additional data would have produced roughly comparable results.

This is interesting to me, as I tend to get seduced by the desire to tweak algorithms


Post a new comment:

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Remember me?