Sidebar
Dav Yaginuma;
Husband, Father, Hacker, Thinker, Maker;
San Francisco.

Twitter

Goodreads

Dav's bookshelf: read

Star Wars: Han Solo
liked it
tagged: graphic-novels
See you at the 7: Stories From the Bay Area's Last Original Mile House
it was amazing
There's a little dive pub (turns out actually not a dive anymore) I'd been meaning to go to for years, and finally stopped by a couple of weeks back. I love checking out the old San Francisco spots that persist through the decades and ha...
The Undefeated
really liked it
Wonderful poem and great illustrations.

goodreads.com

Upcoming

« mobster | | AkuAku makeover »

Comments

Came here via your link in the boingboing comments. Interesting stuff! In a quick look over the document at headmap you linked, two things stand out:

1) Just as the propagation of significance through a word is scaled down by how common (document-diverse) the word is, propagation through a document should be scaled down by how word-diverse the document is. This would help maintain specificity.

2) Instead of dividing by the square root of word occurrences as a scaling factor, I'm recklessly guessing based on info. theory that it should probably be something related to -log (probability of word occurrence). Same-but-reversed for documents: divide by -log (number of indexed words in document / total indexed words). I know, I know, math first, then post. Sorry.

Did I say divide by -log(P)? I meant multiply.

You can check out my Search::ContextGraph module (http://www.cpan.org) for an example of how to add local and global term weighting into the model. It's Perl, but the weighting code will be analogous in Java or C#.

Dan, thanks I'll try this out...

And Maciej, I'll also look at your perl module, thanks!