- Install Whirr, check my other post for details how to install Whirr from source.
- Create your Yarn cluster definition file.
- Copy a template file from the recipes.
cd whirr cp recipes/hadoop-yarn-ec2.properties my-yarn-cluster.properties
- Set your AWS credentials.
vi ~/.bashrc export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY= #Go to your AWS management console to obtain these keys. source ~/.bashrc - Use this AMI locator to find an image for your instance and set in my-yarn-cluster.properties correspondingly. The following is an example.
whirr.image-id=us-east-1/ami-1ab3ce73 whirr.location-id=us-east-1
♥ If you choose a different location, make sure whirr.image-id is updated too ♥ - Comment out the following line:
#whirr.template=osFamily=UBUNTU,osVersionMatches=10.04,os64Bit=true,minRam=2048
- Copy a template file from the recipes.
- Now you are ready to launch the cluster.
whirr launch-cluster --config my-yarn-cluster.properties
It's output is as follows:Running on provider aws-ec2 using identity XXXXXXXXX createClientSideYarnProperties yarn.nodemanager.log-dirs:/tmp/nm-logs createClientSideYarnProperties yarn.nodemanager.remote-app-log-dir:/tmp/nm-remote-app-logs createClientSideYarnProperties yarn.nodemanager.aux-services:mapreduce.shuffle createClientSideYarnProperties yarn.nodemanager.aux-services.mapreduce.shuffle.class:org.apache.hadoop.mapred.ShuffleHandler createClientSideYarnProperties yarn.nodemanager.delete.debug-delay-sec:6000 createClientSideYarnProperties yarn.app.mapreduce.am.staging-dir:/user createClientSideYarnProperties yarn.nodemanager.local-dirs:/data/tmp/hadoop-${user.name} createClientSideYarnProperties yarn.nodemanager.resource.memory-mb:4096 Started cluster of 2 instances Cluster{instances=[Instance{roles=[hadoop-namenode, yarn-resourcemanager, mapreduce-historyserver], publicIp=204.236.250.181, privateIp=10.166.45.20, id=us-east-1/i-a9c6f5cb, nodeMetadata={id=us-east-1/i-a9c6f5cb, providerId=i-a9c6f5cb, name=hadoop-yarn-a9c6f5cb, location={scope=ZONE, id=us-east-1a, description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA]}, group=hadoop-yarn, imageId=us-east-1/ami-1ab3ce73, os={family=ubuntu, arch=paravirtual, version=10.04, description=ubuntu-us-east-1/images/ubuntu-lucid-10.04-amd64-server-20130704.manifest.xml, is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-166-45-20, privateAddresses=[10.166.45.20], publicAddresses=[204.236.250.181], hardware={id=m1.large, providerId=m1.large, processors=[{cores=2.0, speed=2.0}], ram=7680, volumes=[{type=LOCAL, size=10.0, device=/dev/sda1, bootDevice=true, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdb, bootDevice=false, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdc, bootDevice=false, durable=false}], hypervisor=xen, supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())}, loginUser=ubuntu, userMetadata={Name=hadoop-yarn-a9c6f5cb}}}, Instance{roles=[hadoop-datanode, yarn-nodemanager], publicIp=54.225.52.2, privateIp=10.164.60.16, id=us-east-1/i-c6cd99ae, nodeMetadata={id=us-east-1/i-c6cd99ae, providerId=i-c6cd99ae, name=hadoop-yarn-c6cd99ae, location={scope=ZONE, id=us-east-1a, description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA]}, group=hadoop-yarn, imageId=us-east-1/ami-1ab3ce73, os={family=ubuntu, arch=paravirtual, version=10.04, description=ubuntu-us-east-1/images/ubuntu-lucid-10.04-amd64-server-20130704.manifest.xml, is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-164-60-16, privateAddresses=[10.164.60.16], publicAddresses=[54.225.52.2], hardware={id=m1.large, providerId=m1.large, processors=[{cores=2.0, speed=2.0}], ram=7680, volumes=[{type=LOCAL, size=10.0, device=/dev/sda1, bootDevice=true, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdb, bootDevice=false, durable=false}, {type=LOCAL, size=420.0, device=/dev/sdc, bootDevice=false, durable=false}], hypervisor=xen, supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())}, loginUser=ubuntu, userMetadata={Name=hadoop-yarn-c6cd99ae}}}]} You can log into instances using the following ssh commands: [hadoop-namenode+yarn-resourcemanager+mapreduce-historyserver]: ssh -i /home/meng/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no meng@204.236.250.181 [hadoop-datanode+yarn-nodemanager]: ssh -i /home/meng/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no meng@54.225.52.2 To destroy cluster, run 'whirr destroy-cluster' with the same options used to launch it.
- Don't forget to destroy the cluster after you are done using it. Your instances are running on EC2 and it only offers limited-time free usage. Otherwise you might receive a huge bill after a while like I did a month ago lol... That's another story...
whirr destroy-cluster --config my-yarn-cluster.properties
Wednesday, July 31, 2013
Starting a Yarn Cluster on EC2 via Whirr
In this post, I will show you how to start a Yarn cluster on EC2, again using Whirr! Yes, with Whirr, you can provide cluster with ONE-CLICK!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment