aku-aku: v.. To move a tall, flat bottomed object (such as a bookshelf) by swiveling it alternatively on its corners in a "walking" fashion. [After the book by Thor Heyerdahl theorising the statues of Easter Island were moved in this fashion.] source: LangMaker.com. Aku Aku also has another meaning to the islanders: a spiritual guide.
« mobster | Main Page | AkuAku makeover »
contextual network graphing moveable type
Posted by dav at 2003 July 12 07:06 AM
File under: Geek

When I was at the Emerging Technology conference back in April, one of the sessions that got me all hot and bothered was Maciej Ceglowski's presentation on semantic mapping. Specifically the Contextual Network Graph system which he described and released into the public domain. The CNG system is able to handle a keyword search and bring up documents that are similar even though they don't contain the keyword. One example Maciej gave was that you could do a search on 'photosynthesis' and have documents returned that contained only the phrase 'plants get their energy from sunlight.' It is able to do this by noting which words might be shared between documents that contain the keyword and documents that don't, and their relative importance.

I could see many uses for a generalized version of this system, including using it in my day job for doing searches in the chemistry space. Unfortunately Maciej didn't release any code for the CNG at the time, and the code he did release (for a patented system called LSI) was written in C++ and I got over C++ a long time ago. So I had been meaning to implement CNG in Java ever since but just didn't get around to it.

Last night I discovered that Anselm, my collegue in the Headmap Collective, has been working with CNG for a couple of weeks now and just completed a working alpha level implementation in C# (he also has a detailed explanation of CNG at that previous link). Since C# is based on Java, it was trivial to port it, so it is now available in the headmap cvs tree.

After porting it over I wanted to be able to pull in some external documents to play around with it. I wrote a quick system that reads an export file from a Moveable Type blog, stores it as a CNG where each blog posting is a separate document, and then lets you interactively specify a post and get back a list of the other posts sorted by similarity.

If you want to play around with it yourself, you can download the headmap-cng.jar jar file and an export file (obtain that from your MT blog menu), then run the utility from the command line like this:

java -classpath headmap-cng.jar org.headmap.cng.util.SemanticMT myblog.txt

You'll need Java 1.4 installed on your system.

It's not very useful yet; the algorithm needs to be tweaked and a better utility needs to be written. But it has some promise and interesting applications. Download the source code (headmap-cng-src.zip) and explore it!

Comments:

Came here via your link in the boingboing comments. Interesting stuff! In a quick look over the document at headmap you linked, two things stand out:

1) Just as the propagation of significance through a word is scaled down by how common (document-diverse) the word is, propagation through a document should be scaled down by how word-diverse the document is. This would help maintain specificity.

2) Instead of dividing by the square root of word occurrences as a scaling factor, I'm recklessly guessing based on info. theory that it should probably be something related to -log (probability of word occurrence). Same-but-reversed for documents: divide by -log (number of indexed words in document / total indexed words). I know, I know, math first, then post. Sorry.

Posted by: Dan on July 14, 2003 05:27 PM

Did I say divide by -log(P)? I meant multiply.

Posted by: Dan on July 14, 2003 05:45 PM

You can check out my Search::ContextGraph module (http://www.cpan.org) for an example of how to add local and global term weighting into the model. It's Perl, but the weighting code will be analogous in Java or C#.

Posted by: Maciej Ceglowski on July 23, 2003 07:45 AM

Dan, thanks I'll try this out...

And Maciej, I'll also look at your perl module, thanks!

Posted by: Dav on July 23, 2003 11:29 PM

Post a new comment:

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?