MapReduce Interview Questions Answers

MapReduce Questions Answers.

Frequently asked in a Big Data, Data Scientist Interview in top companies like IBM, Google, HP, Cisco.

Explain what is MapReduce?
MapReduce as it is called the heartbeat of Big data, and the heart of Hadoop. Its is a programming framework/paradigm that helps us in processing massive amount of unstructured data, spread over hundred of thousands of servers in a Hadoop cluster, thereby making the application scalable.

MapReduce consists of two phases: Map, and then Reduce. Between the two is a stage known as the shuffle and sort. Each Map task operates on a discrete portion of the overall dataset. Typically one HDFS block of data. After all Maps are complete, the MapReduce system distributes the intermediate data to nodes which perform the Reduce phase. Each node processes data stored on that node where possible.

Can I write a MapReduce program with any language other than Java?
Yes, MapReduce can be written in many programming languages Java, R, C++, scripting languages (Python, PHP). Any language able to read from stadin and write to stdout and parse tab and new line characters should work. Hadoop streaming (a Hadoop utility) allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.

Is the MapReduce infrastructure on the BDA open source?Yes, the core Hadoop HDFS storage and MapReduce compute infrastructure is 100% open source.

Since $HADOOP_HOME is deprecated on CDH4.1.2 / BDA V2.0.1 what environment variable should be used?

On BDA V2.0.1 with CDH 4.1.2, use $HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce.

What is the impact of shutting down a server for maintenance on MapReduce jobs?
In the general case for a non-critical server (i.e. not node 1, 2, or 3) HDFS should redistribute jobs to other nodes. There should be no noticeable impact.

Can standard R code be translated into MapReduce?
ORCH V2.0 can auto generate Hive queries for R Language constructs to aid in data analysis and data preparation. The Hive queries in turn are executed as map-reduce code. This is accomplished through the ore API (ore.connect(type="HIVE")).

MapReduce Interview Questions Answers

MapReduce Questions Answers.

Frequently asked in a Big Data, Data Scientist Interview in top companies like IBM, Google, HP, Cisco.

No comments :