SparkContext is one application.
Application when running will have multiple executors on workers.
Each executor will have a number of core as well as it owns memory.
Spark driver (
spark.driver.memory) is the program that declares the transformations and actions on RDDs of data and submits such requests to the master.
Number of cores (
spark.executor.cores) is concurrent tasks an executor can run
Number of executor (
spark.executor.instances) is the maximum executors that the cluster can handle
Executor memory (
spark.executor.memory) is the memory of the executor
For example with the below cluster
**Cluster Config:**10 Nodes16 cores per Node64GB RAM per Node
Number of cores: 5 # ideal number for HDFS to run 5 concurrent tasks
Total cores = [cores per node - 1 (for application master) ] x number of node . E.g (16-1)x10 = 150
Number of available executors = total cores / number of core per executors. E.g = 150/5 = 30
Leaving 1 executor for ApplicationManager =>
--num-executors = 29
Number of executors per node = 30/10 = 3
Memory per executor = 64GB/3 = 21GB
Counting off heap overhead = 7% of 21GB = 3GB. So, actual
--executor-memory = 21 - 3 = 18GB