spark
spark copied to clipboard
[SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first`
What changes were proposed in this pull request?
Use spark.driver.userClassPathFirst instead of spark.yarn.user.classpath.first.
Why are the changes needed?
When we set spark.yarn.user.classpath.first=true, we get the following message.
23/10/30 14:32:16.855 pool-1-thread-1-ScalaTest-running-ClientSuite
WARN SparkConf: The configuration key 'spark.yarn.user.classpath.first' has been deprecated as of Spark 1.3 and may be removed in the future. Please use spark.{driver,executor}.userClassPathFirst instead.
Does this PR introduce any user-facing change?
No
How was this patch tested?
exist UT
Was this patch authored or co-authored using generative AI tooling?
No
cc @mridulm , too
Your review covers what I would have added @dongjoon-hyun :-)
I thought about it, and spark.yarn.user.classpath.first doesn't quite have the same effect as spark.driver.userClassPathFirst or spark.executor.userClassPathFirst.
spark.yarn.user.classpath.first is similar to mapreduce.job.user.classpath.first, declaring that the user's jar is placed at the front of the classpath, so that the YARN startup container will load it first, and it is the same classloader as the jars of spark.
And spark.{driver,executor}.userClassPathFirst does not modify the classpath, but uses ChildFirstURLClassLoader to load the jar, which is not the same classloader as the spark jars, and the behavior of the two is different.
So the modification of this PR may be invalid. Not sure why this parameter is deprecated.
https://github.com/apache/spark/blob/d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9/core/src/main/scala/org/apache/spark/SparkConf.scala#L605-L606
https://github.com/apache/spark/blob/d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1527-L1537