Thursday, May 28, 2015


12:53 PM - By ajay desai 0

                                             HDFS ARCHITECTURE

The Hadoop Distributed File System is used for storing and retrieving big data distributed across several nodes within a hadoop cluster.

A hadoop cluster consists of a set of data nodes also called as slave nodes along with a master node called as Name node. In HDFS, data is stored in the form of blocks. A file data is divided into some no: of blocks and each block is stored in a particular data node and replicated on some other data nodes. A block is the smallest storage unit of data in HDFS, whose size is 64MB by default. We can only increase the block size in multiples of 64MB for a file. The default replication factor for a block is 3 i.e. each block is replicated atleast 3 times in a hadoop cluster.

In the above given figure, let us assume that, there is a file named as Accounts.txt in the local file system of Data node 1 . This file of size: 140 MB is divided into 3 blocks: B1, B2 and B3, each of 64 MB by default. But, the block: B3 of 64 MB will have only 12 MB of data, the remaining 36 MB gets wasted, it cannot be used by some other file. This wastage is tolerated because while retrieving data, all the blocks of a file when combined have to give the data related to a single file.

These are the following components of the HDFS architecture: -

1) Name Node: - This is the master node which manages the HDFS namespace i.e. this node      maintains all the metadata information. Here, metadata means physical locations of data blocks of a  file that are present on the data nodes.

The responsibilities of a name node are: -
(i) Allocating data nodes for storing data of a file in the form of blocks.
(ii) Storing the memory address of each and every block present on a data node. (i.e. metadata).
(iii) Using this metadata to retrieve data from all the data blocks of a file in order to retrieve complete data of a file. The Name node stores all this metadata in its local file system in its two files: FS Image and Edit log.
(iv) Managing data processing in case of node failure.

2) Data node: -These are the major working blocks of HDFS which store the actual data in the form of data blocks. All the data processing takes place within these data nodes.

3) Secondary Name Node: - This node comes into picture when our Name node fails. The secondary name node takes all the past metadata information present in the FS Image and edit log files of name node and also takes a copy of the current metadata information present in the temporary folder of name node at the point of failure and starts coordinating with the data nodes. Once the Name node comes back, it takes all the metadata information from the secondary name node and resumes work from where the secondary name node has left.

About the Author

I am Azeheruddin Khan having more than 6 year experience in c#, and ms sql.My work comprise of medium and enterprise level projects using and other Microsoft .net technologies. Please feel free to contact me for any queries via posting comments on my blog,i will try to reply as early as possible. Follow me @fresher2programmer
View all posts by admin →

Get Updates

Subscribe to our e-mail newsletter to receive updates.

Share This Post



© 2014 Fresher2Programmer. WP Theme-junkie converted by Bloggertheme9
Powered by Blogger.
back to top