Thursday, June 11, 2015


12:28 PM - By ajay desai 0

There are three phases in Map Reduce: -

1) Mapper

2) Sort and Shuffle

3) Reducer

1) Mapper

The input data is first divided into no: of splits and each split is assigned to a mapper. In the above given example, there is a file which consists of three lines : -

  (i) Deer Bear River
  (ii) Car Car River
  (iii) Deer Car Bear

Now, to each line of the file, a byte offset value is assigned like 101010L, this byte offset value is the key and the line is the value. for example: 10101011L,Deer Bear River  

So, input is given to the mapper in the form of <key.value> pairs. Now the mapper divides the line into tokens i.e. the line: Deer Bear River is divided into three words which are considered as keys, value '1' is used for all these keys. So the output given by mapper is : -
 (i)   <Deer,1>
 (ii)  <Bear,1>
 (iii) <River,1>

The same process is repeated for the other two lines as follows: -

 (iv)  <Car,1>
 (v)   <Car,1>
 (vi)  <River,1>

(vii)  <Deer,1>
(viii) <Car,1>
(ix)   <Bear,1>

2) Sort and Shuffle

In this phase all the <key,value> pair inputs are arranged (sorted) in a proper order, in the example above they are arranged in the alphabetical order of the first letter of their keys as follows: -

(i)     <Bear,1>
(ii)    <Deer,1>
(iii)   <River,1>
(iv)   <Car,1>
(v)    <Car,1>
(vi)   <River,1>
(vii)  <Bear,1>
(viii) <Car,1>
(ix)   <Deer,1>

Now these <key,value> pairs are grouped or shuffled as follows: -

(i) <bear,1>

(ii) <Car,1>

(iii) <Deer,1>

(iv) <River,1>

And Now <key,list of values> is generated as output by Sort and Shuffle phase as follows; -

(i)   <bear,1,1>
(ii)  <Car,1,1,1>
(iii) <Deer,1,1>
(iv) <River,1,1>

3) Reducer

Now these <key,listof values> pairs are given as input to the reducer, which performs summation and gives the final output in the form of <key,value> pairs as follows: -

(i)   <bear,2>
(ii)  <Car,3>
(iii) <Deer,2>
(iv) <River,2>

Thus, in this way, word count job assigned by the client is completed by MapReduce component.

About the Author

I am Azeheruddin Khan having more than 6 year experience in c#, and ms sql.My work comprise of medium and enterprise level projects using and other Microsoft .net technologies. Please feel free to contact me for any queries via posting comments on my blog,i will try to reply as early as possible. Follow me @fresher2programmer
View all posts by admin →

Get Updates

Subscribe to our e-mail newsletter to receive updates.

Share This Post



© 2014 Fresher2Programmer. WP Theme-junkie converted by Bloggertheme9
Powered by Blogger.
back to top