Wednesday, May 13, 2015

IMPORTANCE OF DISTRIBUTED SYSTEM IN DATA ANALYSIS

11:40 PM - By ajay desai 0

                                       
                                   IMPORTANCE OF DATA ANALYSIS

 Nowadays, huge volumes of data is generated everyday by individuals and organizations.

2.7 Zetabytes of data exist in the digital universe today.

Facebook stores 30+ Petabytes of user generated data.

Twitter generates around 1 TB of data every day from mobile phones, PCs and other devices.

Apart from websites huge amount of data also comes from electronic devices like sensors and so on.

All this data need to be properly analyzed in order to extract useful information which helps organizations and individuals in decision making. That useful information extracted from raw data is called Knowledge.

For example: - HP company wants to take decision about putting some discount on the sale of HP printers in order to increase its sale. For this, the sales manager needs past historical data like past 10 years of sales data related to HP printers. The manager first collects all the past historical data and then might use some anlysis tool to come to a conclusion that , customers are more likely to buy a HP printer along with a HP desktop PC rather than a HP laptop. Then the manager can decide about putting a discount offer by reducing the price of HP printer from Rs. 5000 to Rs. 3000 and increasing the price of HP desktop PC from  Rs.34000 to Rs.35000.

Now let us consider that, the past historical sales data of HP printers is of 2 - 3 GB and is generated from a single source, then all the analysis can be done on a single computational node. But let us consider that, data is coming from different sources which store data  in different formats, then collecting all that data from those sources and then converting the data into a format that is understandable to an analysis tool used by the sales manager will itself take a lot of time with very less time left for data analysis.

So, in this scenario distributed system comes into picture.


                               WHAT IS A DISTRIBUTED SYSTEM?

A distributed system is a group of computing nodes which might be placed at different geographical locations but are connected to each other through a common and dedicated network and these nodes perform some set of tasks individually in order to complete a common computational job. Each node stores and accesses data from its storage unit usually maintained by a SAN (Storage Area Network).


Now let us consider that, HP company  at California USA wants to take a decision to fix a price for new manufactured HP laptops for that it needs to analyze 5 PB of sales data of HP laptops  distributed across  India, China, Japan and Dubai for past 5 years. And this data is stored in different servers in different formats. Let us see how a distributed system solves this problem.

As we know that the distributed system is a network of computational nodes so all the 4 servers at India, Japan, China and Dubai act as 4 computational nodes. The server at California is the master node which will divide the analysis job into 4 analysis tasks and each task is assigned by the master node to the slave node. Now the slave node at India completes its analysis task, forwards its result to the master node and intimates the next slave node to start its analysis task and so on. So one slave node in the distributed system waits for the intimation from its previous slave node to start its task.

When the master node at California gets the output from all the slave nodes, it correlates all the output to get the final result.

 

             




Tags:
About the Author

I am Azeheruddin Khan having more than 6 year experience in c#, Asp.net and ms sql.My work comprise of medium and enterprise level projects using asp.net and other Microsoft .net technologies. Please feel free to contact me for any queries via posting comments on my blog,i will try to reply as early as possible. Follow me @fresher2programmer
View all posts by admin →

Get Updates

Subscribe to our e-mail newsletter to receive updates.

Share This Post

0 comments:

adsense

© 2014 Fresher2Programmer. WP Theme-junkie converted by Bloggertheme9
Powered by Blogger.
back to top