CouchDB MapReduce example: word count
Note: Updated for CouchDB trunk as of 1/15/09
One of the classic Hadoop MapReduce tutorials counts words in a text corpus. Word counts are a great way to teach the fundamentals of MapReduce, and there's a lot of free books on Project Gutenburg.
To follow along at home, checkout Couchrest from the Github
git clone git://github.com/jchris/couchrest.git
Also install the gem
sudo gem install couchrest
I've included the example code as well as 3 books in the "example" directory.
The short version is:
cd couchrest
ruby examples/word_count/word_count.rb #loads the books into couchdb
ruby examples/word_count/word_count_views.rb #creates the design document
ruby examples/word_count/word_count_query.rb #runs the query
The last step could take a few minutes (and you may have to rerun it if Ruby times out). But eventually you'll get some happy output.
To re run the queries - also fun to edit and play with params:
ruby examples/word_count/word_count_query.rb
The initial reduction can take about 5 minutes to run on the average MacBook, so this ruby script will probably time out and fail the first time. Go get some coffee. When you come back, run it again. Once the reduce has run, queries should be nearly instantaneous.
The code teaches the fundamentals of CouchDB view functions, collation order, and reduce query params, and provides some helpful output while doing so.
The upshot is that you can now query for the count of any word, in one of the three indexed books, or in all three. And those queries are fast!
by beppu, 2008/05/25 23:17:11 +0000
h2. whoops
You need to use the public clone URL:
git clone git://github.com/jchris/couchrest.git
h2. a lolcat for your troubles
!http://icanhascheezburger.files.wordpress.com/2007/03/doinitwrong.jpg!
(someone has to use the textile functionality)
by Chris Anderson, 2008/05/26 02:27:59 +0000
thnks, fixing!
!http://mine.icanhascheezburger.com/completestore/okaitrynow128404542381120000.jpg!
by beppu, 2008/05/31 17:58:51 +0000
!http://www.webmercial.dk/wp-content/uploads/2007/07/lolcat.jpg!
..oh.. I do has them.
h2. Thanks for making this example.
by Shawn, 2008/06/02 17:10:27 +0000
by Shawn, 2008/06/02 17:39:45 +0000
Stupid firewall (see previous post). Everytime we get a port open, it lasts about a week, then the firewall guys close it again.
by Matt Brubeck, 2008/07/05 07:47:38 +0000
Using couchdb from subversion, I can load the "words" view, but when I try to load the "count" view CouchDB errors out with: <pre> [debug] [<0.51.0>] Spawning new update process for view group design/wordcount in database word-count-example. [info] [<0.55.0>] Spawning new javascript instance. [info] [<0.55.0>] HTTP Error (code 500): {'EXIT', {noproc, {genserver, call, [<0.62.0>,{preadbin,9073960}]}}} [info] [<0.55.0>] 127.0.0.1 - - "GET /word-count-example/view/wordcount/count" 500 </pre>
Any ideas? I tried to get more information about the error, but I'm not even sure where is the best place to start.
by Brandon Zylstra, 2008/12/03 04:44:02 +0000
The example you give is not working for me:
$ ruby examples/wordcount/wordcount.rb examples/wordcount/wordcount.rb:1:in `require': no such file to load -- examples/wordcount/../../couchrest (LoadError) from examples/wordcount/word_count.rb:1
Any ideas?