[SPARK-52469][CORE] Judge Java version via JDK method
What changes were proposed in this pull request?
As title.
Why are the changes needed?
- Prefer to use the JDK method over 3rd party libraries.
- Mitigate(not guarantee, Spark has no guarantee to work with older commons-lang3) cases that user jars contain old
commons-lang3(which may not have the fieldJAVA_21) and submit with configspark.driver.userClassPathFirst=true, thus may causeNoSuchFieldError.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GHA.
Was this patch authored or co-authored using generative AI tooling?
No.
cc @LuciferYang
I don't have any issues with the code change itself. But will modifying just this part address Cause 2 described in the PR?
Tolerant cases that user jars contain old commons-lang3(which may not have the field JAVA_21) and submit with config spark.driver.userClassPathFirst=true, thus may cause NoSuchFieldError.
But will modifying just this part address Cause 2 ...
Tolerant cases ...
"address"/"tolerant" might not be accurate, I change the word to "mitigate".
Currently, Spark only supports Java 17 and 21, so only a few places have Java version check. Before Spark drops support prior Java 17, there are many places in the Spark code that check Java version using commons-lang3, and Case 2 is a relative common NoSuchFieldError case in our internal Spark jobs(only for which shades old version of commons-lang3 with spark.driver.userClassPathFirst=true). For example,
@pan3793 What I mean is, could using an older version of common-lang3 potentially result in the absence of other methods? Could the issue solely be related to the enumeration problem in this particular version of Java?
could using an older version of common-lang3 potentially result in the absence of other methods?
Yes, it's possible, Spark has no guarantee to work with older commons-lang3
Thank you for making a PR. I got your point of this proposal.
However, actually, I have a different perspective with the following claim, @pan3793 .
Prefer to use the JDK method over 3rd party libraries.
Java Community is also notorious with the incompatible change of version number. For example,
Java-based ASF projects (Apache Hadoop/Hive/Spark) had a bad memory about this to handle both Java 1.8 to Java 8 at that time. IIUC, commons-lang3 was the solution to handle those incompatibility in a middle layer like LANG-1197. In other words, I don't think Runtime.version().feature() is better than commons-lang3 which is supposed to provide a mitigation for those JDK incompatibility.
From the beginning, I don't think Apache Spark community has a promise for the user-provided libraries which are different from built-in jars of Apache Spark's jar directory. If a user injects a broken library, Apache Spark itself will be broken abnormally.
Spark has no guarantee to work with older commons-lang3
In short, it's not a good idea to downgrade Apache Spark's jar dependency with a user-provide one. And, this PR might give a wrong idea to the users which we allow (and guarantee) a user to downgrade it. In fact, we should not and never recommend that way.
To @pan3793 , to achieve your goal, we need to remove the commons-lang3 dependency completely like Apache ORC community. Otherwise, your problem still remains in other code path.
- https://github.com/apache/orc/pull/2053
@dongjoon-hyun thanks for your review and explanation about your concerns.
first of all, sorry if I make thing confusing, to clarify, my goal is not "remove the commons-lang3 dependency completely", actually, unlike Guava, commons-lang3 keeps pretty good backward-compatibility.
and I understand that the change in versioning policy during Java 8 caused some confusion, for projects that supports Java 8 and above, I agree that common-lang3 might be the best and simplest way to handle the JDK vesion, but given now Spark requires Java 17+ and modern JDKs provide simple and stable API (via JEP 223) to get and compare the version (in addition to feature(major) version of JDK, JEP 223 API also provides interim/update/patch version info, which was used in SPARK-50946), switching to JEP 223 API seems to be a future-proofing selection.
Spark has no guarantee to work with older commons-lang3
In short, it's not a good idea to downgrade Apache Spark's jar dependency with a user-provide one. And, this PR might give a wrong idea to the users which we allow (and guarantee) a user to downgrade it. In fact, we should not and never recommend that way.
I absolutely agree with that, I strongly discourage our users/customers to use spark.[driver|executor].userClassPathFirst=true. but on the other hand, these two configs were exposed to user docs for a long time, and some users do rely on it to address some class conflict issues. I would make some minor changes that has no hurt on Spark codebase to make it work better with older version of some libraries.
regards to commons-lang3, Spark uses only a few set of its API, all are present for a long time except to those JAVA_N constants. actually, the JAVA_N constants are added frequently and have a delay from the JDK release, for example, the latest commons-lang3:3.17.0 only has JAVA_22, which was added since 3.15.0, but the latest JDK version is 24, which makes us can not write code like SystemUtils.isJavaVersionAtLeast(JAVA_23)
In short, I think there is no obviously drawback in switching to JEP 223 API, and I can update the PR description to remove the commons-lang3 compatibility words if it cause confusing.
For the following, I'd like to recommend alternative.
I can update the PR description to remove the commons-lang3 compatibility words if it cause confusing.
- To achieve your goal and protect your PR from future regressions, please provide a systematic way by banning the following import from Apache Spark code. You can add it at linter level via
ScalastyleandCheckstylerule.
import org.apache.commons.lang3.JavaVersion
- Use
JEP 233as a substring of the PR title to clarify what we are doing now.JDK methodis a little ambiguous.
WDYT, @pan3793 ?
@dongjoon-hyun thanks for your suggestion, sgtm, updated.