RunJobFlow → ApplicationId submitApplication(ApplicationSubmissionContext appContext)
The RunJobFlow api also include the cluster instantiate process while submitApplication assumes that a Yarn cluster is running.TerminateJobFlows → void killApplication(ApplicationId applicationId)
DescribeJobFlows → ApplicationReport getApplicationReport(ApplicationId appId)
Monday, August 19, 2013
Mapping between EMR APIs and Yarn APIs
Before we get started, we need to remember that there are some equivalent concepts between Yarn and EMR. One application in Yarn is equal to a job flow in EMR. One job in Yarn is amount to a job step in EMR.
Thursday, August 8, 2013
One-Click Hadoop Cluster Launching and Expansion on Nimbus
This tutorial will guide you through the steps of launching and expanding a hadoop cluster on Nimbus in One-Click. Being able to launch one-click Hadoop cluster allow researchers and developers analysis data in an easily and time-effectively manner. In addition one-click Hadoop cluster expansion make it possible for data analysists to flexibly add nodes to their cluster whenever more capacity is needed. The steps are as follows.
- Get the source files from github.
git clone https://github.com/kyrameng/OneClickHadoopClusterOnNimbus.git
- Copy the commands to bin directory and the cluster definition files to sample directory.
cp launch-hadoop-cluster.sh your_nimbus_client/bin/ cp expand-hadoop-cluster.sh your_nimbus_client/bin/ cp hadoop-cluster-template.xml your_nimbus_client/samples/ cp hadoop-add-nodes.xml your_nimbus_client/samples/
- Launch a cluster using the following command.
bin/launch-hadoop-cluster.sh --cluster samples/hadoop-cluster-template.xml --nodes 1 --conf conf/hotel.conf --hours 1
- --nodes specifies how many slave nodes you want to have in this cluster. This command will launch a stand alone master node for you.
- --hours specifies how long this cluster will run.
- --cluster specifies which cluster definition file to use.
- --conf specifies the site where the cluster will be launched.
SSH known_hosts contained tilde: - '~/.ssh/known_hosts' --> '/home/meng/.ssh/known_hosts' Requesting cluster. - master-node: image 'hadoop-50GB-scheduler.gz', 1 instance - slave-nodes: image 'hadoop-50GB-scheduler.gz', 1 instance Context Broker: https://svc.uc.futuregrid.org:8443/wsrf/services/NimbusContextBroker Created new context with broker. Workspace Factory Service: https://svc.uc.futuregrid.org:8443/wsrf/services/WorkspaceFactoryService Creating workspace "master-node"... done. - 149.165.148.157 [ vm-148-157.uc.futuregrid.org ] Creating workspace "slave-nodes"... done. - 149.165.148.158 [ vm-148-158.uc.futuregrid.org ] Launching cluster-042... done. Waiting for launch updates. - cluster-042: all members are Running - wrote reports to '/home/meng/futuregrid/history/cluster-042/reports-vm' Waiting for context broker updates. - cluster-042: contextualized - wrote ctx summary to '/home/meng/futuregrid/history/cluster-042/reports-ctx/CTX-OK.txt' - wrote reports to '/home/meng/futuregrid/history/cluster-042/reports-ctx' SSH trusts new key for vm-148-157.uc.futuregrid.org [[ master-node ]] SSH trusts new key for vm-148-158.uc.futuregrid.org [[ slave-nodes ]] cluster-042 Hadoop-Cluster-Handle cluster99
Go to Hadoop Web UI to check your cluster status. e.g. 149.165.148.157:50030. Also this command will create a unique directory for every launched hadoop cluster to store their cluster definition files. Check your_nimbus_client/Hadoop-Cluster to explore your clusters. - The last line of output is the hadoop cluster handle of the newly launched cluster. We will use this information to expand this cluster in the future.
- Use the following command to expand this cluster. Specify which cluster you want to add nodes to using the --handle option. "--nodes" option will specify how many slave nodes you want to add to a particular cluster.
bin/expand-hadoop-cluster.sh --conf conf/hotel.conf --nodes 1 --hours 1 --cluster samples/hadoop-add-nodes.xml --handle cluster99
Its output is as follows.SSH known_hosts contained tilde: - '~/.ssh/known_hosts' --> '/home/meng/.ssh/known_hosts' Requesting cluster. - newly-added-slave-nodes: image 'ubuntujaunty-hadoop-ctx-pub_v8.gz', 1 instance Context Broker: https://svc.uc.futuregrid.org:8443/wsrf/services/NimbusContextBroker Created new context with broker. Workspace Factory Service: https://svc.uc.futuregrid.org:8443/wsrf/services/WorkspaceFactoryService Creating workspace "newly-added-slave-nodes"... done. - 149.165.148.159 [ vm-148-159.uc.futuregrid.org ] Launching cluster-043... done. Waiting for launch updates. - cluster-043: all members are Running - wrote reports to '/home/meng/futuregrid/history/cluster-043/reports-vm' Waiting for context broker updates. - cluster-043: contextualized - wrote ctx summary to '/home/meng/futuregrid/history/cluster-043/reports-ctx/CTX-OK.txt' - wrote reports to '/home/meng/futuregrid/history/cluster-043/reports-ctx' SSH trusts new key for vm-148-159.uc.futuregrid.org [[ newly-added-slave-nodes ]]
Wednesday, August 7, 2013
CentOS Commands Notes
- Check routing information
route
- Configure a service to start automatically on boot.
chkconfig service_name on
- Check the status of a particular port
netstat | grep port_num
- nfs related commands.
Export all directories listed in /etc/exports.
exportfs -a
Do not export a directory.export -u directory
- Check routing information.
ip route
- Shut down a bridge
ifconfig bridge_name down
Delete a brige.brctl delbr bridge_name
Subscribe to:
Posts (Atom)