spark
spark copied to clipboard
[SPARK-47043][BUILD] add `jackson-core` and `jackson-annotations` dependencies to module `spark-common-utils`
What changes were proposed in this pull request?
This PR aims to fix spark-common-utils
module by adding explicit jackson-core
and jackson-annotations
dependencies.
Why are the changes needed?
spark-common-utils
uses jackson-core
and jackson-annotations
like the following, but we didn't declare it explicitly.
https://github.com/apache/spark/blob/a6bed5e9bcc54dac51421263d5ef73c0b6e0b12c/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala#L26 https://github.com/apache/spark/blob/a6bed5e9bcc54dac51421263d5ef73c0b6e0b12c/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala#L25 https://github.com/apache/spark/blob/a6bed5e9bcc54dac51421263d5ef73c0b6e0b12c/common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala#L23
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.
Please note that we usually fix the wrong dependency like the following situation in order to unblock other PRs. If you have a similar issue, please describe in the PR description, @William1104 .
Previously, it was provided by some unused
htmlunit-driver
's transitive dependency accidentally. This causes a weird situation whichkvstore
module starts to fail to compile when we upgradehtmlunit-driver
. We need to fix this first.
BTW, @William1104 , please don't borrow someone-else name like this.
Hi @dongjoon-hyun
I am sorry about that. I created the JIRA via cloning. I don't know how to update the assignee back to myself. Would you mind to change it back to me, or let me know how I can update the assignee.
Best regards, William
No problem. I just wanted to inform you. I fixed the field by removing my name. :)
Hi @dongjoon-hyun,
Thank you for providing the script. If I understand correctly, the script is designed to check if any dependencies have been updated in an unexpected manner for every profile. This functionality is extremely valuable as it helps us avoid making careless mistakes and provides the reviewer with a clearer understanding of how dependencies will be updated.
It appears that several Spark modules are using libraries without proper declaration, or they have declared certain libraries but are not actually using them. To generate a report highlighting these issues, I ran the following command:
./build/mvn dependency:analyze | sed -n '/<<< dependency:.*:analyze/,/>>> dependency:.*:analyze/p' > dependency-analyze.txt
In the dependency-analyze.txt report, let's take the module "spark-common-utils" as an example. It currently has a compile scope dependency on "commons-io," which could be changed to the test scope to avoid unnecessary transitive dependencies. Here are the relevant excerpts from the report:
[INFO] --- dependency:3.6.1:analyze (default-cli) @ spark-common-utils_2.13 ---
[WARNING] Used undeclared dependencies found:
[WARNING] com.fasterxml.jackson.core:jackson-annotations:jar:2.16.1:compile
[WARNING] org.apache.commons:commons-lang3:jar:3.14.0:compile
[WARNING] com.fasterxml.jackson.core:jackson-core:jar:2.16.1:compile
[WARNING] org.scala-lang:scala-library:jar:2.13.12:compile
[WARNING] org.scalatest:scalatest-funsuite_2.13:jar:3.2.17:test
[WARNING] org.scalactic:scalactic_2.13:jar:3.2.17:test
[WARNING] org.scalatest:scalatest-compatible:jar:3.2.17:test
[WARNING] org.scalatest:scalatest-core_2.13:jar:3.2.17:test
[WARNING] Unused declared dependencies found:
[WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.13:jar:2.16.1:compile
[WARNING] oro:oro:jar:2.0.8:compile
[WARNING] org.slf4j:jul-to-slf4j:jar:2.0.11:compile
[WARNING] org.slf4j:jcl-over-slf4j:jar:2.0.11:compile
[WARNING] org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.22.1:compile
[WARNING] org.apache.logging.log4j:log4j-1.2-api:jar:2.22.1:compile
[WARNING] org.spark-project.spark:unused:jar:1.0.0:compile
[WARNING] org.scalatest:scalatest_2.13:jar:3.2.17:test
[WARNING] org.scalatestplus:scalacheck-1-17_2.13:jar:3.2.17.0:test
[WARNING] org.scalatestplus:mockito-4-11_2.13:jar:3.2.17.0:test
[WARNING] org.scalatestplus:selenium-4-12_2.13:jar:3.2.17.0:test
[WARNING] org.junit.jupiter:junit-jupiter:jar:5.9.3:test
[WARNING] net.aichler:jupiter-interface:jar:0.11.1:test
[WARNING] Non-test scoped test only dependencies found:
[WARNING] commons-io:commons-io:jar:2.15.1:compile
[INFO]
This information suggests that there are both used undeclared dependencies and unused declared dependencies. Additionally, there are non-test scoped test-only dependencies, such as "commons-io:commons-io:jar:2.15.1:compile." These findings can help us identify areas where we can optimize and refine the dependency management within the "spark-common-utils" module.
I would like to create PR to fix the dependency scope.
Thanks and regards, William
Unlike other communities like Apache ORC, Apache Spark community doesn't take advantage of maven-dependency-plugin
to enforce those Used undeclared
and Unused declared
and Non-test scoped test only dependencies
. The main reason is because it would be quite inconvenient in a large and dynamic community like Apache Spark community.
Apache ORC Example: https://github.com/apache/orc/blob/0a02e4cde165b81fb93fc99456a130da4625ef30/java/core/pom.xml#L160-L172
In addition, if you want to propose it, it should be applied systematically like Apache ORC community for all modules, not like many small PRs like this PR and #45101 .
However, if you have a concern about this area and want to contribute that area, I'd like to recommend you to send an email to [email protected] . We can discuss more and try to build a consensus on the necessity of your proposal, @William1104 . You can use your PRs' links in your email.
Hi @dongjoon-hyun ,
Thanks for your explanation. Yes, the spark is a large and very dynamic community. It could be challenging to enforce this, especially since some modules already have more than required dependencies. I would propose fixing dependencies issue module by module, and then enforcing it in a systematic way (via maven dependency plugin).
Let me send an email to [email protected] on this topic. Thanks a lot.
Best regards, William
I think this change itself is fine, and indeed technically correct. Not sure if we want to make lots of changes like this, but the occasional targeted one seems OK.
Did you send an email to dev, @William1104 ? It seems that I missed it.
Let me send an email to [email protected] on this topic. Thanks a lot.
Ya, I totally agree with you and want to get some further consensus in dev mailing list because @William1104 seems to be interested in this area. (Already 2 opened and I guess more to come).
Not sure if we want to make lots of changes like this, but the occasional targeted one seems OK.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!