spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48547][DEPLOY] Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

Open JoshRosen opened this issue 8 months ago • 4 comments

What changes were proposed in this pull request?

This PR adds a new SparkConf flag option, spark.submit.callSystemExitOnMainExit (default false), which when true will cause SparkSubmit to call System.exit() in the JVM once the user code's main method has exited (for Java / Scala jobs) or once the user's Python or R script has exited.

Why are the changes needed?

This is intended to address a longstanding issue where spark-submit runs might hang after user code has completed:

According to Java’s java.lang.Runtime docs:

The Java Virtual Machine initiates the shutdown sequence in response to one of several events:

  1. when the number of live non-daemon threads drops to zero for the first time (see note below on the JNI Invocation API);
  2. when the Runtime.exit or System.exit method is called for the first time; or
  3. when some external event occurs, such as an interrupt or a signal is received from the operating system.

For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call System.exit() if the user program exits with a non-zero exit code (see python and R runner code).

But for Java and Scala programs, plus any successful R or Python programs, Spark will not automatically call System.exit.

In those situation, the JVM will only shutdown when, via event (1), all non-daemon threads have exited (unless the job is cancelled and sent an external interrupt / kill signal, corresponding to event (3)).

Thus, non-daemon threads might cause logically-completed spark-submit jobs to hang rather than completing.

The non-daemon threads are not always under Spark's own control and may not necessarily be cleaned up by SparkContext.stop().

Thus, it is useful to have an opt-in functionality to have SparkSubmit automatically call System.exit() upon main method exit (which usually, but not always, corresponds to job completion): this option will allow users and data platform operators to enforce System.exit() calls without having to modify individual jobs' code.

Does this PR introduce any user-facing change?

Yes, it adds a new user-facing configuration option for opting in to a behavior change.

How was this patch tested?

New tests in SparkSubmitSuite, including one which hangs (failing with a timeout) unless the new option is set to true.

Was this patch authored or co-authored using generative AI tooling?

No.

JoshRosen avatar Jun 05 '24 23:06 JoshRosen