PRACTICAL 4: Working with Hadoop Ecosystem


PRACTICAL 4: Working with Hadoop Ecosystem

This is a short guide on how to install Hadoop single node cluster on a Windows computer. The process is straight forward. First, we need to download and install the following software:
·         JAVA
Download the Java 1.8 from https://java.com/en/download/
Once installed confirm that you’re running the correct version from command line using ‘java -version’ command, output of which you can confirm in command line like this:
Figure 1- checking java
·         HADOOP
Install a Hadoop distribution. To do so, Download the most recent release Hadoop 3.0.0-alpha2 (25 Jan, 2017) in a binary form, from the Apache Download Mirror at http://hadoop.apache.org/releases.html
Once the hadoop-3.0.0-alpha2.tar.gz (250 MB) downloaded, extract it by using WinRAR into C:\hadoop-3.0.0-alpha2 folder:

Figure 2- extracting hadoop in c:

·          Setup Environmental Variables
Ø  In Windows 10 open System Properties windows and click on Environment Variables button:

Figure 3-setting environment variable

Ø  Create a new HADOOP_HOME variable and pointed the path to C:\hadoop-3.0.0-alpha2\bin folder on my PC:

Figure 4-adding variable name

Ø  Add a Hadoop bin directory path to PATH variable. Clicked on PATH and pressed edit:

Figure 5

Ø  add a ‘C:\hadoop-3.0.0-alpha2\bin’ path like this and pressed OK:

Figure 6

·        Edit Hadoop Configuration
Ø Edit C:\hadoop-3.0.0-alpha2\etc\hadoop\core-site.xml file, just like this:

Figure 7-editing core-site.xml

Ø  Next go to to C:\hadoop-3.0.0-alpha2\etc\hadoop folder and rename mapred-site.xml.template to mapred-site.xml.
Ø  Edit the mapred-site.xml file adding the following XML Yarn configuration for Mapreduce:

Figure 8-editting mapred-site.xml

Ø  The next step was to create a new ‘data’ folder in Hadoop’s home directory (C:\hadoop-3.0.0-alpha2\data).
Ø  Once done, the next step is to add a data node and name node to Hadoop, by editing c:\hadoop-3.0.0-alpha2\etc\hadoop\hdfs-site.xml file.
Ø  And add the following configuration to this XML file:

Figure 9-editing hdfs-site.xml


Ø  The next step is to add site specific YARN configuration properties by editing yarn-site.xml at C:\hadoop-3.0.0-alpha2\etc\hadoop\yarn-site.xml, like this:

Figure 10-edit yarn-site.xml

Ø  Go to the location: “Hadoop-2.6.0\etc\hadoop” and edit “hadoop-env.cmd” by writing  set JAVA_HOME=C:\java\jdk1.8.0_91 .
Ø  Check on cmd:

Figure 11.cmd

Ø  Start Hadoop. Go to the location: “D:\hadoop-2.6.0\sbin.” Run the following files as   administrator “start-dfs.cmd” and “start-yarn.cmd”

Figure 12-starting all windows

·        Open Hadoop GUI
Once all above steps are completed,  open browser and navigate to: http://localhost:8088/cluster

Comments

Popular posts from this blog

Login into Gmail Account Using Web Driver

Tutorial 4

Case study of Library Management System