In pursuit of Happiness: Other 3 Lectures on Cluster computing and MapReduce from Google+ NPTEL -Lecture 1(Introduction to Algorithms)

So today I took the last three lectures as well here Google Cluster Computing and Map Reduce.

The third one on distributed file systems talked about Google File System was really good.It was not at all hard to understand and was kind of amazing in the sense that with ideas that are not even that hard to understand , you can actually manage peta bytes of data.Though GFS isn't you general purpose file system but a file system that is optimized to store a few millions of very large files ( the block size is 64 MB!!), optimized for reading large chunks of data in one shot and for appending data at the end of the file.

The other two lectures on Clustering Algorithms and Graph Algorithms was a little hard for me to appreciate because I had no background on it whatsoever.

So I started off by hunting videos on youtube that start with algorithm basics ( I do have a book on the same subject, but watching online lectures seems to be a much better option for me).Guess what I found a gold mine of Lectures from a Professor from Indian Institute of Technology in Delhi , here is the link

I'll try to listen to one lecture everyday and post the class notes here.So today's lecture was on "Introduction to Algorithms" and here are the class notes I took

Data Structures and Algorithms

What is an Algorithm?
What is a good Algorithm? - Small Running time and takes less memory

First Sorting Algorithm - Insertion Sort

Analysis of running time - At the very basic level, there are some basic fundamental operations

for example comparison operation(>,<,==), arithmetic operation(+,-,*,/), logical operations( &&, ||).So run time is just the sum of number of times the fundamental operations need to be executed for a given algorithm, multiplied by the time taken for each of these fundamental operations. There is best case times, worst case times and Average case times. Typically you would want to consider the worst case because that is the upper bound.Secondly the average case is typically as bad as the Worst case .Average case is difficult to compute as well Asymptotic Analysis: This method simplifies analysis of running time by getting rid of the implementation specific detail like what hardware, what software etc. Secondly the obvious way to measure the run-time is to implement it and measure the time taken to run the program , but that is hard and not feasible for obvious reasons (because input size can vary all over the place, the run time will vary based on OS Load, hardware etc etc)

Asymptotic Analysis just captures the essence of how the algorithm's running time increases with the increase in the input size Big Oh Notation f(n) = O(g(n)) if there exists constants c and no such that f(n) <= cg(n) for n >= no
for example f(n) = 2n+6 and g(n) = n. and c = 4 and no is 3 .

Big Oh notation is used for worst case analysis

But really outside the fancy definition the simple rule is just Drop the lower order terms and the constants from the function

For example if the run time is a function 50nlog(n), then in Big-Oh notation you would simply represent it as O(nlog(n))

So if you are running a loop within a loop for input size n, your algorithmic efficiency is O(n(outer loop) * n (inner loop)) = O (n*2), if you have a loop within a loop within a loop it is O(n*3) etc

O(log(n)) is better than O(n) is better than O(n2) is better than O(n to the power k) is better than O ( a to the power n)

There is also big Omega(lower bound) and big Theta (tight bound = average case) - but these notations aren't as widely used as Big Oh

In pursuit of Happiness

Monday, November 9, 2009

Other 3 Lectures on Cluster computing and MapReduce from Google+ NPTEL -Lecture 1(Introduction to Algorithms)

No comments:

Search This Blog

Blog Archive

About Me