What is Hadoop: Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. Hadoop is written in Java and is not OLAP (online analytical processing).
It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It was based on the same concept – storing and processing data in a distributed, automated way so that relevant web search results could be returned faster.
It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Moreover it can be scaled up just by adding nodes in the cluster.
What is Bigdata ?
Bigdata is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.
Today, Hadoop’s framework and ecosystem of technologies are managed and maintained by the non-profit Apache Software Foundation (ASF), a global community of software developers and contributors.
Modules of Hadoop :
- HDFS : Hadoop Distributed File System is suitable for the distributed storage and processing.
- Yarn : Yet Another Resource Negotiator is used for job scheduling and manage the cluster.
- Map Reduce : MapReduce is a processing technique and a program model for distributed computing based on java.
- Hadoop Common : It refers to the collection of common utilities and libraries that support other Hadoop modules.
Why is Hadoop important?
There are many advantage of Hadoop are follow:
- Ability to store and process huge amounts of any kind of data, quickly.
- Fault tolerance. Data and application processing are protected against hardware failure.
- Low cost