In pursuit of Happiness: MIT OCW -Lecture 7(Hashing and Hash Functions)

Today morning I took the 7th Lecture which covered Hashing and Hash Functions.This was not as math intensive as the previous few lectures and was more intuition based( if there is such a thing - but basically what I mean is you could kinda see why things were good or bad without even having to do the Math).Here is what was covered

Symbol table problem that is encountered in Compilers, Operations on the Symbol table i.e. insertion,deletion and search
Direct Access table - Just an array where you use the key to index into the array to find the value
What is "Hashing"?
How to resolve collisions by chaining
Runtime analysis for hashing with resolving collisions by chaining - worst case is Theta(n) and average case is Theta(1+ alpha), where alpha is the load factor(i.e. n/m ,n=number of elements in the Set and m is the number of slots in the Hash Table)
How to choose a Hash function? - Should distribute keys uniformly and regularity of keys (for example all keys are even numbers) should not affect the uniformity
Hash Function using Division Method => h(k) = k Mod m , don't pick m to have a small divisor for example 2 or don't pick m as a power of 2, typically pick a prime which is not too close to power or 2 or 10
Hash Function using Multiplication Method => h(k)= (A.k mod 2**w).right.shift(w -r) where A is an odd integer between 2**(w-1) and 2**w and not too close to the bounds, 2**r = m and w is the word length in bits of the computer.
The above method is fast since multiplication and right shifting are typically faster on a computer than division
Resolving collisions through open addressing i.e. when you don't have storage for pointers for doing resolving using linking. - you basically keep probing with a different hash function each time you don't find an empty slot or the search key you are looking for
Linear Probing Strategy => h(k,i) = (h(k,0) + i) mod m. This method suffers from primarly clustering i.e. basically we are search the Hash Table linearly one after the another, so if we have a block of 10 occupied slots in one stretch, everytime you have to check those 10 slots before moving on to the next empty slot
Double Hashing Strategy => h(k,i) = (h1(k) + i*h2(k))mod m, typically you pick m as a power of 2 and h2(k) is odd for uniform distribution
Analysis of runtime for Open Addressing => Expected number of probes <= 1/(1-alpha) where alpha is the load factor.The lecture proves the above theorem

In pursuit of Happiness

Tuesday, November 24, 2009

MIT OCW -Lecture 7(Hashing and Hash Functions)

No comments:

Search This Blog

Blog Archive

About Me