Sunday, November 8, 2009

Lecture on Cluster Computing and MapReduce from Google

For a while I was curious on what was MapReduce exactly and if even you are curious these lectures from Google will help

http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html

So far I took the first and the second one.Both are approx an hour long.

The first one gives an overview of distributed computing and it's history.
  • Difference between parallel computing and distributed computing
  • Synchronization primitives and Semaphores
  • Condition variables
  • Fundamentals of Networking (what is a port, TCP, IP etc)
The second lecture goes into details of Map Reduce
  • Overview of Functional Programming
  • What is Map and Fold in the context of Functional Programming
  • Overview of MapReduce with the example of a word count on a bunch of files Algorithm
I started writing a very basic MapReduce implementation in Java (using regular Threads of course to parallelize the Mappers and Reducers) for the Word Count Algorithm.I have about 50% completed and hopefully will complete it all tomorrow and post the code out here.

No comments: