Tuesday, 26 February 2013

Introduction to Hadoop Core / File Systems

Apache Hadoop Framework forms the kernel of an operating system for big data permitting users to share resources, managing permissions and allocations.

Map Reduce Layer :

  • The Task Tracker on each node spawns imageoff a separate Java Virtual Machine process to prevent the Task Tracker itself from failing if the running job crashes the JVM.
  • The Job Tracker pushes work out to available Task Tracker nodes in the cluster, striving to keep the work as close to the data as possible.



Crux of MapReduce Architecture:

  • Maps are the individual tasks that transform input records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A given input pair may map to zero or many output pairs.
  • Reducer reduces a set of intermediate values which share a key to a smaller set of values.

HDFS Layer :

  •  Namenode is the single  point for storage and management of metadata, this can be a bottleneck for supporting a huge number of files, especially a large number of small files.
  • Data Node talk to each other to rebalance data, to move copies around, and to keep the replication of data high.

Hadoop Architecture


  • Hadoop is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner.
  • Hadoop Distributes a task or piece of job across a Cluster of Machines, which access a shared file system hosted by a SAN.

Introduction to Big Data

The total volume of data stored electronically, as on 2012, 2.7 zettabytes as per Forbes.com

(A zettabyte is 10^21 bytes, or equivalently one thousand Exabyte, one million petabytes, or one billion terabytes).

Statistical Facts on Big Data :

  • $300 Billion Potential annual value to US health care.
  • $250 billion Potential value to Europe's Public sector administration.
  • $600 Billion Potential annual consumer surplus from using personal location data globally.
  • 140,000-190,000 - More deep analytical talent positions open for data savvy managers in USA during 2011.

Recent Comments