Tune Your Spark Submit Job

1. A generic sprk-submit call

When submitting a spark-submit job, you may tune it using its several options

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10

2. num-executors and executor-cores

Considering your cluster is executing only this spark job, here is how to take advantage of your cluster’s resources:

Spark recommends not to use more than 5 cores per executor

executor-cores = 5

Then you set your num-executors as follow :

{[(Num_of_cores_per_node - 1)/5 ] x number_of_nodes } - 1

Explanation :

  • Num_of_cores_per_node - 1 : We need to keep one core for the OS on each node
  • (Num_of_cores_per_node - 1)/5 : We divide by the number of cores per executor to get the number of executors we may instantiate on each node
  • {[(Num_of_cores_per_node - 1)/5 ] x number_of_nodes } : We multiply it by the number of nodes to get the total number of cores we’ll be able to use on our cluster
  • {[(Num_of_cores_per_node - 1)/5 ] x number_of_nodes } - 1 : We substract one executor which will be used for the “Application Master/Spark Driver”

Example of cluster:

  • 6 nodes
  • 16 cores per node

Result :

  • executor-cores = 5
  • num-executors = [(16 -1)/5] * 6 -1 = 18 - 1 = 17

source : http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

comments powered by Disqus