Wednesday, March 6, 2013

Naive MapReduce

After downloading and saving the text of The Hound of the Baskervilles from Gutenberg.org, enter the following code into the file mapper.py
#!/usr/bin/env python

import sys

counts = {}

for line in sys.stdin:
   words = line.split()
   for word in words:
      counts[word] = counts.get(word, 0) + 1

for el in counts:
   print '%s\t%i' % (el, counts[el])  
And run the program with the following commands
chmod +x mapper.py
./mapper.py <houndofthebaskervilles.txt |sort
This code will run quickly over small datasets but will slow with larger datasets.