Monday, August 19, 2013

Mapping between EMR APIs and Yarn APIs

Before we get started, we need to remember that there are some equivalent concepts between Yarn and EMR. One application in Yarn is equal to a job flow in EMR. One job in Yarn is amount to a job step in EMR.
  1. RunJobFlow
    → ApplicationId  submitApplication(ApplicationSubmissionContext appContext) 
    The RunJobFlow api also include the cluster instantiate process while submitApplication assumes that a Yarn cluster is running.
  2. TerminateJobFlows
    →  void  killApplication(ApplicationId applicationId)  
  3. DescribeJobFlows
    →   ApplicationReport  getApplicationReport(ApplicationId appId)  

Thursday, August 8, 2013

One-Click Hadoop Cluster Launching and Expansion on Nimbus

This tutorial will guide you through the steps of launching and expanding a hadoop cluster on Nimbus in One-Click. Being able to launch one-click Hadoop cluster allow researchers and developers analysis data in an easily and time-effectively manner. In addition one-click Hadoop cluster expansion make it possible for data analysists to flexibly add nodes to their cluster whenever more capacity is needed. The steps are as follows.
  1. Get the source files from github.
    git clone https://github.com/kyrameng/OneClickHadoopClusterOnNimbus.git
    
  2. Copy the commands to bin directory and the cluster definition files to sample directory.
    cp launch-hadoop-cluster.sh  your_nimbus_client/bin/
    cp expand-hadoop-cluster.sh  your_nimbus_client/bin/
    cp hadoop-cluster-template.xml  your_nimbus_client/samples/
    cp hadoop-add-nodes.xml  your_nimbus_client/samples/
    
  3. Launch a cluster using the following command.
    bin/launch-hadoop-cluster.sh --cluster samples/hadoop-cluster-template.xml --nodes 1 --conf conf/hotel.conf --hours 1
    
    1. --nodes specifies how many slave nodes you want to have in this cluster. This command will launch a stand alone master node for you.
    2. --hours specifies how long this cluster will run.
    3. --cluster specifies which cluster definition file to use.
    4. --conf specifies the site where the cluster will be launched.
    Output of the above command.
    SSH known_hosts contained tilde:
      - '~/.ssh/known_hosts' --> '/home/meng/.ssh/known_hosts'
    
    Requesting cluster.
      - master-node: image 'hadoop-50GB-scheduler.gz', 1 instance
      - slave-nodes: image 'hadoop-50GB-scheduler.gz', 1 instance
    
    Context Broker:
        https://svc.uc.futuregrid.org:8443/wsrf/services/NimbusContextBroker
    
    Created new context with broker.
    
    Workspace Factory Service:
        https://svc.uc.futuregrid.org:8443/wsrf/services/WorkspaceFactoryService
    
    Creating workspace "master-node"... done.
      - 149.165.148.157 [ vm-148-157.uc.futuregrid.org ]
    
    Creating workspace "slave-nodes"... done.
      - 149.165.148.158 [ vm-148-158.uc.futuregrid.org ]
    
    Launching cluster-042... done.
    
    Waiting for launch updates.
      - cluster-042: all members are Running
      - wrote reports to '/home/meng/futuregrid/history/cluster-042/reports-vm'
    
    Waiting for context broker updates.
      - cluster-042: contextualized
      - wrote ctx summary to '/home/meng/futuregrid/history/cluster-042/reports-ctx/CTX-OK.txt'
      - wrote reports to '/home/meng/futuregrid/history/cluster-042/reports-ctx'
    
    SSH trusts new key for vm-148-157.uc.futuregrid.org  [[ master-node ]]
    
    SSH trusts new key for vm-148-158.uc.futuregrid.org  [[ slave-nodes ]]
    cluster-042
    Hadoop-Cluster-Handle cluster99
    
    Go to Hadoop Web UI to check your cluster status. e.g. 149.165.148.157:50030. Also this command will create a unique directory for every launched hadoop cluster to store their cluster definition files. Check your_nimbus_client/Hadoop-Cluster to explore your clusters.
  4. The last line of output is the hadoop cluster handle of the newly launched cluster. We will use this information to expand this cluster in the future.
  5. Use the following command to expand this cluster. Specify which cluster you want to add nodes to using the --handle option. "--nodes" option will specify how many slave nodes you want to add to a particular cluster.
    bin/expand-hadoop-cluster.sh --conf conf/hotel.conf --nodes 1 --hours 1 --cluster samples/hadoop-add-nodes.xml --handle cluster99
    
    Its output is as follows.
    SSH known_hosts contained tilde:
      - '~/.ssh/known_hosts' --> '/home/meng/.ssh/known_hosts'
    
    Requesting cluster.
      - newly-added-slave-nodes: image 'ubuntujaunty-hadoop-ctx-pub_v8.gz', 1 instance
    
    Context Broker:
        https://svc.uc.futuregrid.org:8443/wsrf/services/NimbusContextBroker
    
    Created new context with broker.
    
    Workspace Factory Service:
        https://svc.uc.futuregrid.org:8443/wsrf/services/WorkspaceFactoryService
    
    Creating workspace "newly-added-slave-nodes"... done.
      - 149.165.148.159 [ vm-148-159.uc.futuregrid.org ]
    
    
    Launching cluster-043... done.
    
    Waiting for launch updates.
      - cluster-043: all members are Running
      - wrote reports to '/home/meng/futuregrid/history/cluster-043/reports-vm'
    
    Waiting for context broker updates.
      - cluster-043: contextualized
      - wrote ctx summary to '/home/meng/futuregrid/history/cluster-043/reports-ctx/CTX-OK.txt'
      - wrote reports to '/home/meng/futuregrid/history/cluster-043/reports-ctx'
    
    SSH trusts new key for vm-148-159.uc.futuregrid.org  [[ newly-added-slave-nodes ]]
    

Wednesday, August 7, 2013

CentOS Commands Notes

  1. Check routing information
    route
    
  2. Configure a service to start automatically on boot.
    chkconfig service_name on
    
  3. Check the status of a particular port
    netstat | grep port_num
    
  4. nfs related commands. Export all directories listed in /etc/exports.
    exportfs -a
    
    Do not export a directory.
    export -u directory
    
  5. Check routing information.
    ip route
    
  6. Shut down a bridge
    ifconfig bridge_name down
    
    Delete a brige.
    brctl delbr bridge_name