spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first`

Open cxzl25 opened this issue 2 years ago • 1 comments

What changes were proposed in this pull request?

Use spark.driver.userClassPathFirst instead of spark.yarn.user.classpath.first.

Why are the changes needed?

When we set spark.yarn.user.classpath.first=true, we get the following message.

23/10/30 14:32:16.855 pool-1-thread-1-ScalaTest-running-ClientSuite 
WARN SparkConf: The configuration key 'spark.yarn.user.classpath.first' has been deprecated as of Spark 1.3 and may be removed in the future. Please use spark.{driver,executor}.userClassPathFirst instead. 

Does this PR introduce any user-facing change?

No

How was this patch tested?

exist UT

Was this patch authored or co-authored using generative AI tooling?

No

cxzl25 avatar Oct 30 '23 06:10 cxzl25

cc @mridulm , too

dongjoon-hyun avatar May 10 '24 16:05 dongjoon-hyun

Your review covers what I would have added @dongjoon-hyun :-)

mridulm avatar May 13 '24 07:05 mridulm

I thought about it, and spark.yarn.user.classpath.first doesn't quite have the same effect as spark.driver.userClassPathFirst or spark.executor.userClassPathFirst.

spark.yarn.user.classpath.first is similar to mapreduce.job.user.classpath.first, declaring that the user's jar is placed at the front of the classpath, so that the YARN startup container will load it first, and it is the same classloader as the jars of spark.

And spark.{driver,executor}.userClassPathFirst does not modify the classpath, but uses ChildFirstURLClassLoader to load the jar, which is not the same classloader as the spark jars, and the behavior of the two is different.

So the modification of this PR may be invalid. Not sure why this parameter is deprecated.

https://github.com/apache/spark/blob/d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9/core/src/main/scala/org/apache/spark/SparkConf.scala#L605-L606

https://github.com/apache/spark/blob/d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1527-L1537

cxzl25 avatar May 15 '24 14:05 cxzl25