–deploy-mode cluster \ # can be client for client modeĪn example spark-submit command that takes the number of executors required for the Spark job as a parameter.
Resource schedulers like YARN then take care of “coarse-grained” autoscaling between different jobs, releasing resources back only after a Spark job finishes. But most of these tools expect a static resource size allocated for a single job, which doesn’t take advantage of the elasticity of the cloud. Today, every big data tool can auto-scale compute to lower costs. What’s the problem with current state-of-the-art autoscaling approaches? When we tested long-running big data workloads, we observed cloud cost savings of up to 30%. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to resize a cluster intelligently, improving resource utilization.
Databricks is thrilled to announce our new optimized autoscaling feature.