Friday, October 01, 2004

Talk, monkey-boy, talk!

Well, I watched the Presidential debate last night. It went pretty much as I expected,

although I have to say Bush didn't stumble nearly as often as I was expecting (or as I would have liked), although he didn't really manage to say anything. I spent a good amount of time this morning at work discussing how Bush managed to get through the whole thing only using 4 sentences, when my boss had a cool idea: perform a unique word count on each participant's comments and see who had more to say.

This sounded cool. I did a quick Google and didn't see anything else like this (at least not yet), so I decided to do one myself.

First I had to find a transcript of the debate (I would have linked to it, but I don't want to get busted for copyright infringement or terrorism, or anti-patroit-activism. Google is your friend), paste it into an OpenOffice document, remove all comments/questions by the moderator, then go through and cut all of Kerry's comments and past into a new text document, and do the same for Bush (into a second text document). The plan then called for me to use Openoffice (or MS Word) to perform the unique word count.

Turns out neither MS Word or OpenOffice have a "unique word count" feature. So after some digging, I found a little DOS command line program called "Textinfo" that did what I needed. Grabged a copy and ran both of my documents through it.

The results were anticlimactic:

Unique words used by John Kerry: 1,261
Unique words used by George Bush: 1,178

Not nearly the blowout I had expected. The TextInfo program isn't perfect, as it seems to count "t" as a separat word, when used in words like "didn't". So while it's not a perfect representation, it's evenly flawed. I may go through and remove all the common stuff (a, the, and, i, to...) and the incorrect stuff (t, s...) and see what that leaves us.

For the full breakdown on all words used, click on the speaker below.

GW Bush
John Kerry

Special thanks to Ben of the paved earth radio network and for hosting the files.