[SPARK-49984][CORE] Fix duplicate JVM options
What changes were proposed in this pull request?
Fix how the JVM options are supplemented with Java module options and IPv6 options. Get and set based on the exact value of spark.*.extraJavaOptions, and don't use the helper config which combines both default and extra Java options.
Why are the changes needed?
The current approach using DRIVER_JAVA_OPTIONS and EXECUTOR_JAVA_OPTIONS loads both the spark.*.extraJavaOptions and spark.*.defaultJavaOptions, adds the "supplemental" options, and then resaves the result in spark.*.extraJavaOptions. This is because DRIVER_JAVA_OPTIONS is a config like:
private[spark] val DRIVER_JAVA_OPTIONS =
ConfigBuilder(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS)
.withPrepended(SparkLauncher.DRIVER_DEFAULT_JAVA_OPTIONS)
The result is spark.*.defaultJavaOptions being added multiple times to spark.*.extraJavaOptions, and when the full options are actually pulled to launch a process, three copies of what was in the spark.*.defaultJavaOptions ends up being included on the command line.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual testing
Before:
$ pyspark --conf spark.driver.defaultJavaOptions="-Dfoo=bar"
>>> spark.conf.get('spark.driver.extraJavaOptions')
'-Djava.net.preferIPv6Addresses=false -Dfoo=bar -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false -Dfoo=bar'
>>> spark.conf.get('spark.driver.defaultJavaOptions')
'-Dfoo=bar'
After:
$ pyspark --conf spark.driver.defaultJavaOptions="-Dfoo=bar"
>>> spark.conf.get('spark.driver.extraJavaOptions')
'-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false -Dio.netty.tryReflectionSetAccessible=true'
>>> spark.conf.get('spark.driver.defaJavultJavaOptions')
'-Dfoo=bar'
Was this patch authored or co-authored using generative AI tooling?
No
@LuciferYang @dongjoon-hyun since you created these methods
+1, Agree with @dongjoon-hyun
Added a test to the SparkContextSuite
Thank you for updating and sorry for the delay on this PR, @Kimahriman .
Merged to master for Apache Spark 4.1.0. Thank you, @Kimahriman .
late LGTM