Hadoop


What Is Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Best is given in below URL
http://hadoop.apache.org/


What is Hadoop ,How to implement it..


null

People do have confusion. how to implement how to use hadoop examples ,
----------------------------------------------------------------------------------------------------------
Hadoop means ..Map reduce..it is basically using the concept of map reduced which is
successfully developed by companies like google,facebook,yahoo etc

Overview
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
full example
download map reduce pdf
--------------------------------------------------------------------------------------------
Some more usful link
to be honest it's a new concept need to look the examples..couple of time and ...think about code...certainly you will find .some good answer

-------------------------------------------------------------------------------------------------
Learn Hadoop to Manage Big Data


Networks are evolving rapidly, and that includes the Internet.  Therefore, HTML has become almost passé.  Users demand a media rich experience, but way beyond that they want to know everything instantly.  This calls for learning new tools.

Learn Hadoop for the Information Age

Travel sites not only let you find and book a trip, now they compare various sites to let you know which travel site has the best deals.  On-line retail and auction conglomerates let visitors quickly search hundreds of thousands of items and almost instantly find what they want.  Welcome to the world of big data.
Trying to quickly deal with a ton of a data to get complex yet fast results has been a primary mission for networks over the decades, but it has been a difficult task.  Storage, processing, and connectivity bandwidth limitations caused very complicated applications and structures to evolve, along with IT programming specialists to build and maintain them.
As worldwide connectivity and cloud computing has developed, the data paradigm has also shifted.  Businesses and organizations can not only buy exactly the amount of storage it needs (that includes backup), it can also pay for only the processing power it needs when it needs it.  How to manage data in the age of distributed computing has also evolved, and one of the most important big data tools to come along is Hadoop.
useful link
https://www.udemy.com/course/apache-hadoop-essential-training/?displayType=&utm_source=blog&utm_medium=udemyads&utm_content=post89692&utm_campaign=content-marketing-blog&xref=blog
---------------------------------------------------------------------------------------------------------------


Apache Hadoop is an open-source software platform that runs on MapReduce technology in order to perform distributed computations on various hardware servers. It was originally adapted from Google File System and Google MapReduce papers, and is now used by hundreds of enterprises for processing, storing, and analyzing large volumes of structured and unstructured data.
By using Hadoop-based software, even the largest enterprises can easily manage their big data and at a much lower cost than non-Hadoop based software platforms.
The Global Hadoop Market is expected to reach US$8.74 billion by 2016, growing at a CAGR of 55.63 percent during the period 2012–2016. We’ve recognized 14 of the biggest and best Hadoop technology companies in the world who will be playing a big role in market growth for the next few years.
“Amazon Elastic MapReduce provides a managed, easy to use analytics platform built around the powerful Hadoop framework. Focus on your map/reduce queries and take advantage of the broad ecosystem of Hadoop tools, while deploying to a high scale, secure infrastructure platform.”


“IBM InfoSphere BigInsights makes it simpler for people to use Hadoop and build big data applications. It enhances this open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, and security features, along with best-in-class analytical capabilities from IBM Research. The result is that you get a more developer and user-friendly solution for complex, large scale analytics.”


“Pivotal Introduces World’s Most Powerful Hadoop Distribution: Pivotal HD. Unlocks Hadoop as Key to Big Data’s Transformational Potential for Data-Driven Enterprises; Delivers Over 100X Performance Improvement with HAWQ.”


“Cloudera develops open-source software for a world dependent on Big Data. With Cloudera, businesses and other organizations can now interact with the world's largest data sets at the speed of thought — and ask bigger questions in the pursuit of discovering something incredible.”


“MapR combines all the innovation of the Apache Hadoop community with additional capabilities to make MapR the only enterprise-grade Hadoop distribution. MapR's advanced architecture brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified Big Data platform to support mission-critical and real-time production uses.”


“At Hortonworks, we believe that Hadoop is an enterprise viable data platform and that the most effective path to its delivery is within the open community. To this end, we build, distribute and support a 100% open source distribution of Apache Hadoop that is truly enterprise grade and follow these three key principles:  identify and introduce enterprise requirements into the public domain, work with the community to advance and incubate open source projects, and apply Enterprise Rigor to deliver the most stable and reliable distribution”


“Natively designed for Hadoop, Karmasphere is a unified workspace for full-fidelity Big Data Analytics that provides accessto all the data in its original form to preserve richness and flexibility. An open, standards-based solution for teams of data and business analysts who need to quickly and easily ingest, analyze, and visualize Big Data on Hadoop.”


“Hadapt’s flagship product is the Adaptive Analytical Platform, which brings a native implementation of SQL to the Apache Hadoop open-source project. By combining the robust and scalable architecture of Hadoop with a hybrid storage layer that incorporates a relational data store, Hadapt allows interactive SQL-based analysis of massive data sets.”


“Supermicro designs and develops the ideal turnkey pilot racks for getting started with Apache Hadoop. Leveraging Supermicro’s optimized servers and switches as a foundation, Supermicro has designed two turnkey racks to get anyone started—14U and 12U versions.”


“Pentaho’s visual development tools drastically reduce the time to design, develop and deploy Hadoop analytics solutions by as much as 15x, compared to traditional custom coding and ETL approaches. Pentaho provides a powerful visual user interface for ingesting and manipulating data within Hadoop, and makes it easy to enrich Hadoop data with reference data from other sources.”


“Zettaset Orchestrator v5 is an enterprise software solution that automates, accelerates, and simplifies Hadoop installation and cluster management for Big Data deployments, and delivers faster time to value. Zettaset has created a solution that meets the exacting requirements of enterprises for security, high availability, and performance within the Hadoop cluster environment”


“DataStax Enterprise seamlessly integrates open source technologies to manage real-time data with Apache Cassandra, run analytics on the Cassandra data with Apache Hadoop, and easily search the Cassandra data with Apache Solr. DataStax Enterprise is not a data warehouse platform like those offered by pure Hadoop vendors, but rather is designed to use Hadoop for analyzing line-of-business data stored in a distributed Cassandra database cluster. DataStax Enterprise smartly separates analytic operations from transactional workloads so that neither competes with the other for data management resources.”


“Datameer's Big Data analytics application for Hadoop ensures the fastest time to discovering insights in any data. Anyone can use Datameer's wizard-based data integration, iterative point-and-click analytics, and drag-and-drop visualizations to find the insights that matter to drive their business forward. Founded by Hadoop veterans in 2009, Datameer scales from a laptop to thousands of nodes and is available for all major Hadoop distributions”


“Being able to analyze your growing mountain of data can give you a distinct competitive advantage, but big data can be more than traditional tools can handle. Dell Apache™ Hadoop™ Solutions can help by providing superfast analysis, data mining and processing.”
Ref- See more at: http://www.technavio.com/blog/top-14-hadoop-technology-companies#sthash.XANHRVOY.dpuf
-------------------------------------------------------------------------------------------------

No comments:

Post a Comment