Begin this summer..: Starting a Yarn Cluster on EC2 via Whirr

In this post, I will show you how to start a Yarn cluster on EC2, again using Whirr! Yes, with Whirr, you can provide cluster with ONE-CLICK!

Install Whirr, check my other post for details how to install Whirr from source.

Create your Yarn cluster definition file.

Copy a template file from the recipes.

cd whirr
cp recipes/hadoop-yarn-ec2.properties my-yarn-cluster.properties

Set your AWS credentials.

vi ~/.bashrc
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY= #Go to your AWS management console to obtain these keys.
source ~/.bashrc

Use this AMI locator to find an image for your instance and set in my-yarn-cluster.properties correspondingly. The following is an example.
```
whirr.image-id=us-east-1/ami-1ab3ce73
whirr.location-id=us-east-1
```
♥ If you choose a different location, make sure whirr.image-id is updated too ♥

Comment out the following line:

#whirr.template=osFamily=UBUNTU,osVersionMatches=10.04,os64Bit=true,minRam=2048

Now you are ready to launch the cluster.

whirr launch-cluster --config my-yarn-cluster.properties

It's output is as follows:

Running on provider aws-ec2 using identity XXXXXXXXX
createClientSideYarnProperties yarn.nodemanager.log-dirs:/tmp/nm-logs
createClientSideYarnProperties yarn.nodemanager.remote-app-log-dir:/tmp/nm-remote-app-logs
createClientSideYarnProperties yarn.nodemanager.aux-services:mapreduce.shuffle
createClientSideYarnProperties yarn.nodemanager.aux-services.mapreduce.shuffle.class:org.apache.hadoop.mapred.ShuffleHandler
createClientSideYarnProperties yarn.nodemanager.delete.debug-delay-sec:6000
createClientSideYarnProperties yarn.app.mapreduce.am.staging-dir:/user
createClientSideYarnProperties yarn.nodemanager.local-dirs:/data/tmp/hadoop-${user.name}
createClientSideYarnProperties yarn.nodemanager.resource.memory-mb:4096
Started cluster of 2 instances
Cluster{instances=[Instance{roles=[hadoop-namenode, yarn-resourcemanager, mapreduce-historyserver], publicIp=204.236.250.181, privateIp=10.166.45.20, id=us-east-1/i-a9c6f5cb, nodeMetadata={id=us-east-1/i-a9c6f5cb, providerId=i-a9c6f5cb, name=hadoop-yarn-a9c6f5cb, location={scope=ZONE, id=us-east-1a, description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA]}, group=hadoop-yarn, imageId=us-east-1/ami-1ab3ce73, os={family=ubuntu, arch=paravirtual, version=10.04, description=ubuntu-us-east-1/images/ubuntu-lucid-10.04-amd64-server-20130704.manifest.xml, is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-166-45-20, privateAddresses=[10.166.45.20], publicAddresses=[204.236.250.181], hardware={id=m1.large, providerId=m1.large, processors=[{cores=2.0, speed=2.0}], ram=7680, volumes=[{type=LOCAL, size=10.0, device=/dev/sda1, bootDevice=true, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdb, bootDevice=false, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdc, bootDevice=false, durable=false}], hypervisor=xen, supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())}, loginUser=ubuntu, userMetadata={Name=hadoop-yarn-a9c6f5cb}}}, Instance{roles=[hadoop-datanode, yarn-nodemanager], publicIp=54.225.52.2, privateIp=10.164.60.16, id=us-east-1/i-c6cd99ae, nodeMetadata={id=us-east-1/i-c6cd99ae, providerId=i-c6cd99ae, name=hadoop-yarn-c6cd99ae, location={scope=ZONE, id=us-east-1a, description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA]}, group=hadoop-yarn, imageId=us-east-1/ami-1ab3ce73, os={family=ubuntu, arch=paravirtual, version=10.04, description=ubuntu-us-east-1/images/ubuntu-lucid-10.04-amd64-server-20130704.manifest.xml, is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-164-60-16, privateAddresses=[10.164.60.16], publicAddresses=[54.225.52.2], hardware={id=m1.large, providerId=m1.large, processors=[{cores=2.0, speed=2.0}], ram=7680, volumes=[{type=LOCAL, size=10.0, device=/dev/sda1, bootDevice=true, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdb, bootDevice=false, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdc, bootDevice=false, durable=false}], hypervisor=xen, supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())}, loginUser=ubuntu, userMetadata={Name=hadoop-yarn-c6cd99ae}}}]}

You can log into instances using the following ssh commands:
[hadoop-namenode+yarn-resourcemanager+mapreduce-historyserver]: ssh -i /home/meng/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no meng@204.236.250.181
[hadoop-datanode+yarn-nodemanager]: ssh -i /home/meng/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no meng@54.225.52.2
To destroy cluster, run 'whirr destroy-cluster' with the same options used to launch it.

Don't forget to destroy the cluster after you are done using it. Your instances are running on EC2 and it only offers limited-time free usage. Otherwise you might receive a huge bill after a while like I did a month ago lol... That's another story...
```
whirr destroy-cluster --config my-yarn-cluster.properties
```

Good luck playing with Whirr :)

Begin this summer..

Wednesday, July 31, 2013

Starting a Yarn Cluster on EC2 via Whirr

No comments:

Post a Comment