spark
spark copied to clipboard
[SPARK-48547][DEPLOY] Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits
What changes were proposed in this pull request?
This PR adds a new SparkConf flag option, spark.submit.callSystemExitOnMainExit
(default false), which when true will cause SparkSubmit to call System.exit()
in the JVM once the user code's main method has exited (for Java / Scala jobs) or once the user's Python or R script has exited.
Why are the changes needed?
This is intended to address a longstanding issue where spark-submit
runs might hang after user code has completed:
According to Java’s java.lang.Runtime docs:
The Java Virtual Machine initiates the shutdown sequence in response to one of several events:
- when the number of live non-daemon threads drops to zero for the first time (see note below on the JNI Invocation API);
- when the Runtime.exit or System.exit method is called for the first time; or
- when some external event occurs, such as an interrupt or a signal is received from the operating system.
For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call System.exit() if the user program exits with a non-zero exit code (see python and R runner code).
But for Java and Scala programs, plus any successful R or Python programs, Spark will not automatically call System.exit.
In those situation, the JVM will only shutdown when, via event (1), all non-daemon threads have exited (unless the job is cancelled and sent an external interrupt / kill signal, corresponding to event (3)).
Thus, non-daemon threads might cause logically-completed spark-submit jobs to hang rather than completing.
The non-daemon threads are not always under Spark's own control and may not necessarily be cleaned up by SparkContext.stop()
.
Thus, it is useful to have an opt-in functionality to have SparkSubmit automatically call System.exit()
upon main method exit (which usually, but not always, corresponds to job completion): this option will allow users and data platform operators to enforce System.exit() calls without having to modify individual jobs' code.
Does this PR introduce any user-facing change?
Yes, it adds a new user-facing configuration option for opting in to a behavior change.
How was this patch tested?
New tests in SparkSubmitSuite
, including one which hangs (failing with a timeout) unless the new option is set to true
.
Was this patch authored or co-authored using generative AI tooling?
No.