Technologies
Home / Technologies / Hadoop Ecosystem

Hadoop Ecosystem

Apache Hadoop is a framework that enables the distributed processing of large sets of data across clusters of machines.

At Impetus, we have years of experience with the entire Hadoop ecosystem–from the core components (HDFS and MapReduce) to the extended family of proprietary and open source tools. Our experience and expertise enables us to address Big Data challenges quickly and efficiently.

Impetus focuses on the following Big Data challenges:
  • Distributed and parallel computing
  • Automatic load balancing
  • Scalability and high failover
  • Data replication and robustness
  • Network and disk-transfer optimization
Standard Components

Hive Pig
http://hive.apache.org/ http://pig.apache.org/

Extended Components
MapReduce Profiler
– Jumbune
 
https://cwiki.apache.org/FLUME/home.html https://github.com/facebook/scribe
 
http://incubator.apache.org/chukwa/ http://sqoop.apache.org/
 
http://mahout.apache.org/ http://oozie.apache.org/