Google Map Reduce :
MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster. Computational processing can occur on data stored either in a filesystem (unstructured) or within a database(structured).
"Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure.
The worker node processes that smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output – the answer to the problem it was originally trying to solve.
Parallel Computing :
To learn why we are moving from serial to paraller computing. And how Map Reduce works on parallel computing concepts. Refer the following link which show from basic
Those Who need to learn from basic where as a wonder full tutorial, So nice and step-by-step from Google Developer (Aaron ) – an awesome guy. This tutorial is about Distributed Computing, Map Reduce and Cloud Computing.
This link is first part and it contains 5 part. You should watch all the 5 and atlast you will be have an idea how distributed computing works.
An Awesome Video where I have seen, when I’m studing college, And impressed on Hadoop.
Oops !. Let me intro about what is Hadoop.
Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google‘s MapReduceand Google File System (GFS) papers.
Now Hadoop is used by all over world for Distributed and Cloud Computing from Institute to Yahoo, MicroSoft.
To learn on Hadoop, refer the following link
To get tutorial links, examples look in to
Yahoo is the major supportive for Hadoop,
Cloudera provides better support of platform to work on Hadoop.
They offer a VMWare to work on Hadoop and lot more.
Got your stomach full !. Its just a starting, lot more there.
Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.
Just start to learn and start your own cluster today.
Ubuntu has good support to work on Hadoop.
And still do you think why do I need to go forward.
Ok Lets check Who Are using Hadoop.
Adobe, AOL, FaceBook, PSG Tech, Google, IBM, IIT, Yahoo and this list goes on…..
If you wanna really to learn the Cutting Edge Technology, You are already delayed back.
Lets Start learning it from Today onwards and rock in with Distributive and Cloud Computing.