tag:blogger.com,1999:blog-48180019832131160742024-03-08T14:09:46.985-08:00Analytics on Big Data Naveenhttp://www.blogger.com/profile/01892407690516075293noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-4818001983213116074.post-85781954352500739342013-02-26T10:48:00.001-08:002013-02-26T10:52:56.526-08:00Introduction to Hadoop Core / File Systems<p>Apache Hadoop Framework forms the kernel of an operating system for big data permitting users to share resources, managing permissions and allocations.</p> <p><u><strong><font color="#000080" size="2">Map Reduce Layer :</font></strong></u></p> <ul> <li>The Task Tracker on each node spawns <a href="http://lh6.ggpht.com/-K8pOCeZp0ZY/US0DdVtXS1I/AAAAAAAAAjg/WsZHTzCpIwc/s1600-h/image%25255B5%25255D.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px" title="image" border="0" alt="image" align="right" src="http://lh5.ggpht.com/-yxZMDuQk_s4/US0DeC_PMsI/AAAAAAAAAjo/t_aGAn68mdo/image_thumb%25255B3%25255D.png?imgmax=800" width="244" height="195" /></a>off a separate <a href="http://en.wikipedia.org/wiki/Java_Virtual_Machine">Java Virtual Machine</a> process to prevent the Task Tracker itself from failing if the running job crashes the JVM. </li> <li>The Job Tracker pushes work out to available Task Tracker nodes in the cluster, striving to keep the work as close to the data as possible. </li> </ul> <p></p> <p> </p> <p> </p> <p><strong><font color="#000080" size="2"></font></strong></p> <p><strong><font color="#000080" size="2">Crux of MapReduce Architecture:</font></strong></p> <ul> <li><strong>Maps</strong> are the individual tasks that transform input records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A given input pair may map to zero or many output pairs. </li> <li><strong>Reducer</strong> reduces a set of intermediate values which share a key to a smaller set of values.</li> </ul> <p><u><strong><font color="#000080" size="2">HDFS Layer :</font></strong></u></p> <ul> <li> <strong>Namenode </strong>is the single  point for storage and management of metadata, this can be a bottleneck for supporting a huge number of files, especially a large number of small files. </li> <li><strong>Data Node</strong> talk to each other to rebalance data, to move copies around, and to keep the replication of data high. </li> </ul> Naveenhttp://www.blogger.com/profile/01892407690516075293noreply@blogger.com0tag:blogger.com,1999:blog-4818001983213116074.post-12900955083824420602013-02-26T10:39:00.001-08:002013-02-26T10:50:37.358-08:00Hadoop Architecture<p><a href="http://lh3.ggpht.com/-0fKBxC1_DZI/US0BU3TWGRI/AAAAAAAAAjM/sa0zvmiuZiM/s1600-h/image%25255B5%25255D.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-e69-NcXR9Kc/US0BV4DPRNI/AAAAAAAAAjU/sRcWUQbO1eY/image_thumb%25255B3%25255D.png?imgmax=800" width="464" height="240" /></a> </p> <ul> <li> Hadoop is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) in a reliable, fault-tolerant manner.</li> <li>Hadoop Distributes a task or piece of job across a Cluster of Machines, which access a shared file system hosted by a SAN.</li> </ul> Naveenhttp://www.blogger.com/profile/01892407690516075293noreply@blogger.com0tag:blogger.com,1999:blog-4818001983213116074.post-48116957491738867652013-02-26T10:36:00.001-08:002013-02-26T10:36:46.211-08:00Introduction to Big Data<p>The total volume of data stored electronically, as on 2012, 2.7 zettabytes as per<a href="http://www.forbes.com/sites/ciocentral/2012/05/01/big-data-the-hidden-opportunity/"> Forbes.com</a></p> <p>(A zettabyte is 10^21 bytes, or equivalently one thousand Exabyte, one million petabytes, or one billion terabytes).</p> <p><strong><font color="#000080">Statistical Facts on Big Data :</font></strong></p> <ul> <li>$300 Billion Potential annual value to US health care.</li> <li>$250 billion Potential value to Europe's Public sector administration.</li> <li>$600 Billion Potential annual consumer surplus from using personal location data globally.</li> <li>140,000-190,000 - More deep analytical talent positions open for data savvy managers in USA during 2011.</li> </ul> Naveenhttp://www.blogger.com/profile/01892407690516075293noreply@blogger.com0