cassandra
cassandra copied to clipboard
CASSANDRA-17750: Remove Maven Ant Tasks
Jira: https://issues.apache.org/jira/browse/CASSANDRA-17750 CircleCI: https://app.circleci.com/pipelines/github/aratno/cassandra?branch=CASSANDRA-17750-remote-maven-ant-tasks
This replaces our existing dependency management, found in one place here, with three explicit pom files.
This makes dependency management harder for the committers and contributors, and it will introduce confusion having the pom.xml file present in the project. I'm not sure it is an approach we want to take (i think needs further clarification needed on dev@)
Is there another way to immediately and quickly address CVE-2017-1000487 ?
Which is only exposed to someone building Cassandra, if I understand it correctly. Furthermore, I'm not sure how we define or pin the plexus dependency that is the source of the CVE here, for example we only define the MAT jar file to use here.
@michaelsembwever - thanks for taking a look.
Furthermore, I'm not sure how we define or pin the plexus dependency that is the source of the CVE here
The plexus-utils2 dependency is a transitive dependency included in the last released version of Maven Ant Tasks, which we pull in as a JAR here: https://github.com/apache/cassandra/blob/889ca60edc8afd85b4a594442883c1b5efdf3c6b/build.xml#L472
As an alternative to address CVE-2017-1000487, I considered authoring a forked Maven Ant Tasks and bumping the plexus-utils2 dependency to bypass the vulnerable version. I figured that it was worse to maintain a fork of a long-retired project, even with a very simple fork. It's unfortunate that Maven Artifact Resolver Ant Tasks doesn't support an alternative to the writepom task - in my opinion, that would be ideal. And because Maven Ant Tasks is required so early in our build, we can't use exclusion rules to re-write the plexus-utils2 dependency since Maven Ant Tasks is what resolves those exclusions in the first place.
It might be possible to simplify to two POMs, by removing apache-cassandra-build-deps.pom. I'm not clear on why that's necessary when Maven already supports different classpath scopes for dependencies (compile, provided, test, etc), and we could ideally include all the build dependencies in apache-cassandra.pom with the compile scope.
I agree that this adds some pain to dependency management for contributors. I tried to simplify this as much as possible with my changes in f80da9b, by centralizing all dependency versions and scopes in the parent POM, and making the child POMs reference those.
Do you have any other thoughts on this approach before I share this discussion with the dev list?
The plexus-utils2 dependency is a transitive dependency included in the last released version of Maven Ant Tasks
Yes, plexus is inside the jar file.
I considered authoring a forked Maven Ant Tasks
We define where we download the jar file from, so we can download a patched jar file from any other location (though we want the source open and verifiable).
Do you have any other thoughts on this approach before I share this discussion with the dev list?
Can you explain how one takes advantage of this CVE here? (surface and vector)
Can you explain how one takes advantage of this CVE here? (surface and vector)
From what I can tell, Maven Ant Tasks doesn't directly use the vulnerable components of plexus-utils2 (Commandline, BourneShell, and Shell from https://github.com/codehaus-plexus/plexus-utils/commit/b38a1b3a4352303e4312b2bb601a0d7ec6e28f41), but the dependency includes these as it's a utilities collection. A fork of Maven Ant Tasks could exclude these files from plexus-utils2, or bump the version of plexus-utils2.
Outside of this particular CVE, there's a separate issue (in my opinion) of depending on a build system that doesn't receive security updates, so I opted to follow the advice of Maven Ant Tasks and remove our dependency on the project.
We define where we download the jar file from, so we can download a patched jar file from any other location (though we want the source open and verifiable).
I am strongly against maintaining a patched jar for a project we do not maintain... that burden is much higher than working with pom files for dependencies in my option.
This makes dependency management harder for the committers and contributors, and it will introduce confusion having the pom.xml file present in the project. I'm not sure it is an approach we want to take (i think needs further clarification needed on dev@)
I wouldn't say it makes things harder... right now you have to find all pom files we generate in ant and make sure you update the right one; I mess this up 100% of the time I try to touch our dependencies... Maven pom files are well known and well understood, so easier to reason about than our current model. If you are new to this project, trying to reverse engineer how our dependencies work in our build is way more complex than just working with the pom files directly...
Now, can having the pom files confuse people? Sure, some may think the existence of a .pom means we can build with maven so anyone who tries may be confused (though our docs always and only say to use ant).
Is there another way to immediately and quickly address https://github.com/advisories/GHSA-8vhq-qq4p-grq3 ?
The only other thing I can think is to some how have a new ant task generate the pom files for us... we have a custom plugin in the code right now so a custom plugin to turn xml into... xml... might be reasonable?
there's a separate issue (in my opinion) of depending on a build system that doesn't receive security updates, so I opted to follow the advice of Maven Ant Tasks and remove our dependency on the project.
Dead project become a burden for us to maintain... one thing that bites us is when there is a new version of the JDK that requires our dependencies to update before we can update... when we rely on dead projects we then get forced to not support the latest LTS JDKs, or take the burden of replacing them... Ekaterina and I have both attempted to get Apache Cassandra to use JDK 17 and that itself is now a large burden due to dependencies...
Maven pom files are well known and well understood, so easier to reason about than our current model.
Ok, thanks for chiming in @dcapwell :)
Before we commit to using maven pom.xml files directly for dependencies, we should look at what our choices are. That is, is there another format that we would rather declare and write our dependencies in, that can generate in a supported way the pom files. (Personally I find directly using the pom files just for dependency management clumsy, but easy to go anyway.)
If we go for direct pom files, then can be put them in the .build/ folder? Could we also rename them, e.g. parent-pom.xml cassandra-deps.xml cassandra-build-deps.xml ?
is there another format that we would rather declare and write our dependencies in
Do you have any criteria for comparing formats, besides personal preference?
I went with Maven because there's already a good integration with Ant via Maven Artifact Resolver Ant Tasks, and translating from another format into POM is prone to quirks. I recognize that people will have their own preferences on this topic, but discussions to change dependency manager rarely progress well.
Do you know of the context for having Ant generate three separate Cassandra POMs before? Maybe this change would be more palatable if we consolidated into one cassandra-pom.xml?
I'm open to renaming and moving the files however you see fit. Moving .pom to .xml sounds good, especially to reduce confusion for new users that may otherwise think that the project build entrypoint is Maven.
Do you have any criteria for comparing formats, besides personal preference?
Sure, we need to stay away from "it's my favourite build tool" sentiments, and focus on what is the easiest format to add, audit, remove dependencies to. For example, i'd rather not work with multiple files, or the verbosity of xml, and ideally not with separate sections for dependency management and different scopes…
That is, is there another format that we would rather declare and write our dependencies in, that can generate in a supported way the pom files.
Any format has to translate properly to pom as that is our publish artifact to maven central, so no matter how we manage the dependencies we must make sure this artifact does not regress.
It seems that resolution also depends on pom (same as before as well), so I guess my question on "another format" are:
- how well documented is it?
- is there a community that supports it?
- is the translation to pom lossy in any way, and if we need to "extend" how do we validate we are safe?
- is it adding more cognitive load than direct pom? what about more cognitive load than current model?
- does it change the way we resolve dependencies today?
The only other format I see that has ok answers to my questions above is Apache Ivy https://ant.apache.org/ivy/, though never used in a project before so not sure if question 5 is ok or not, always remember ivy acting differently than maven when it came to resolution...
Now, if it comes to pom vs ivy... my personal preference is pom.
If we go for direct pom files, then can be put them in the .build/ folder? Could we also rename them, e.g. parent-pom.xml cassandra-deps.xml cassandra-build-deps.xml ?
Since new people are unlikely to know about .build I think this solves the confusion of "why is there a pom file if this can't be built by maven". Moving/renaming works for me
Do you know of the context for having Ant generate three separate Cassandra POMs before? Maybe this change would be more palatable if we consolidated into one cassandra-pom.xml?
merging into 1 pom could be a problem as we publish more than one: https://repo1.maven.org/maven2/org/apache/cassandra/, we publish cassandra-all and cassandra-parent currently...
Now, given the fact the other artifacts are gone (utils, thrift, etc.), the parent seems less useful... so just having a cassandra-all may make sense, but it might be seen as a regression, so that would be a conversation for dev@ mailing list.
Now, if we agree to drop parent, then we only need cassandra-all and we can merge the build dependencies into it by using pom resolution to make things optional or tests scope; this would drastically simplify our dependency management today...
To be clear, I think we can go from 3 to 2 files without a regression, but to go down to 1 might cause a regression in parent, so would need dev@ buy in...
audit
What do you mean by audit? To me I define this as CI tools to monitor our dependencies, if this is your thinking then I believe @aratno has experience leveraging this patch to enable such auditing... if you mean something else do let me know.
For example, i'd rather not work with multiple files, or the verbosity of xml, and ideally not with separate sections for dependency management and different scopes…
do you have something in mind? Our ant build currently violates what you are saying...
merging into 1 pom could be a problem as we publish more than one: https://repo1.maven.org/maven2/org/apache/cassandra/, we publish cassandra-all and cassandra-parent currently...
there's also the tmp-apache-cassandra-*-deps.pom which we generate and use but don't publish.
What do you mean by audit?
manual and CI. our full transitive dependency tree should always be easy to inspect. but automated tools and checkers are also part of this point (which i'm guessing favours the argument for a more popular format like poms)
do you have something in mind? Our ant build currently violates what you are saying...
we not discussing rewriting the ant build. this was only in the context of dependency management, which is quite verbose in maven, compared to ivy for example.
manual and CI. our full transitive dependency tree should always be easy to inspect.
How did you do this before? build.xml did not include transitive dependencies, so I presume you had to use ant _write-poms then mvn dependency:tree?
For automated dependency checking, I was using OWASP's DependencyCheck, which uses a combination of POM and JAR scanning. This was the process that flagged Maven Ant Tasks originally.
there's also the tmp-apache-cassandra-*-deps.pom which we generate and use but don't publish.
on trunk I see
<artifact:pom id="build-deps-pom"
artifactId="cassandra-build-deps">
<parent groupId="org.apache.cassandra"
artifactId="cassandra-parent"
version="${version}"
relativePath="${final.name}-parent.pom"/>
<dependency groupId="junit" artifactId="junit" scope="test"/>
<dependency groupId="commons-io" artifactId="commons-io" scope="test"/>
<dependency groupId="org.mockito" artifactId="mockito-core" scope="test"/>
<dependency groupId="org.ow2.asm" artifactId="asm" version="${asm.version}"/>
<dependency groupId="org.ow2.asm" artifactId="asm-tree" version="${asm.version}" scope="test"/>
<dependency groupId="org.ow2.asm" artifactId="asm-commons" version="${asm.version}" scope="test"/>
<dependency groupId="org.ow2.asm" artifactId="asm-util" version="${asm.version}" scope="test"/>
<dependency groupId="com.google.jimfs" artifactId="jimfs" version="1.1" scope="test"/>
<dependency groupId="com.puppycrawl.tools" artifactId="checkstyle" scope="test"/>
<dependency groupId="org.quicktheories" artifactId="quicktheories" scope="test"/>
<dependency groupId="org.reflections" artifactId="reflections" scope="test"/>
<dependency groupId="com.google.code.java-allocation-instrumenter" artifactId="java-allocation-instrumenter" version="${allocation-instrumenter.version}" scope="test"/>
<dependency groupId="org.apache.cassandra" artifactId="dtest-api" scope="test"/>
<dependency groupId="org.openjdk.jmh" artifactId="jmh-core" scope="test"/>
<dependency groupId="org.openjdk.jmh" artifactId="jmh-generator-annprocess" scope="test"/>
<dependency groupId="net.ju-n.compile-command-annotations" artifactId="compile-command-annotations" scope="test"/>
<dependency groupId="org.apache.ant" artifactId="ant-junit" scope="test"/>
<dependency groupId="org.apache.cassandra" artifactId="harry-core"/>
<!-- adding this dependency is necessary for assertj. When updating assertj, need to also update the version of
this that the new assertj's `assertj-parent-pom` depends on. -->
<dependency groupId="org.junit" artifactId="junit-bom" type="pom"/>
<dependency groupId="org.awaitility" artifactId="awaitility"/>
<dependency groupId="org.hamcrest" artifactId="hamcrest"/>
<!-- coverage debs -->
<dependency groupId="org.jacoco" artifactId="org.jacoco.agent"/>
<dependency groupId="org.jacoco" artifactId="org.jacoco.ant"/>
<dependency groupId="com.fasterxml.jackson.dataformat" artifactId="jackson-dataformat-yaml"/>
</artifact:pom>
this is adding test dependencies, so adding test dependencies with test scope to the single pom would let us get rid of this file; that leaves us with parent and all
this is adding test dependencies, so adding test dependencies with test scope to the single pom would let us get rid of this file; that leaves us with parent and all
We don't need (/want) them published in any pom.
The only reason tmp-apache-cassandra-*-deps.pom exists is to download/put the test dependencies into a separate directory.
In theory I see no reason we cannot move to one .build/dependencies.xml files, which is one unified pom file.
Which versions are we applying this too? (<4.1 and we need to honour the two separate pom names) And the maven vs ivy vs … decision should still be brought to call to the dev ML.
this is adding test dependencies, so adding test dependencies with test scope to the single pom would let us get rid of this file; that leaves us with parent and all
We don't need (/want) them published in any pom.
My understanding is that this is the whole intention of the test scope in Maven. These dependencies are included in the POM, but only installed on the classpath for test executions. Is there any reason you want these excluded from a potential unified POM rather than depending on this functionality of the test scope? What does the separate directory solve that the test scope does not?
Which versions are we applying this too?
I was planning to apply to trunk, 3.0, 3.11, 4.0, 4.1 - as a security fix, back-porting is necessary. Is this right?
And the maven vs ivy vs … decision should still be brought to call to the dev ML.
I'll start a conversation on the dev list today - will link here when it's posted.
Is there any reason you want these excluded from a potential unified POM rather than depending on this functionality of the test scope?
Encapsulation. Our test dependencies need not be made public. You cannot find our test classes anywhere in a maven repo, so why publish the test dependencies list?
The dependency list is use for two things: actual dependencies (build and test) , and publishing maven poms for those maven jars we publish. The latter is a smaller subset than the former.
We can break that encapsulation ofc, if the trade-off (to dependency maintenance cost) is worth it.
What does the separate directory solve that the test scope does not?
The separate directory is required. While it's valuable in practice to separate the application's classpath from the test classpath, it's important for us because the application's classpath (lib/) is bundled into our convenience (binary) artefacts. (Also note the difference between lib/ and build/lib/jars/.)
From what I can tell, Maven Ant Tasks doesn't directly use the vulnerable components of plexus-utils2
If this is accurate… I think it's important to stress we can ignore the CVE. That this then is an improvement to replace a deprecated and unmaintained dependency, but otherwise nothing urgent or broken today.
Mailing list discussion here: https://lists.apache.org/thread/57b2whys3ddt0fx81j47ofo5mr5qbt5n
If this is accurate… I think it's important to stress: that is that we can ignore the CVE. That this is an improvement to replace a deprecated and unmaintained dependency, but otherwise nothing urgent or broken today.
I just want to point out that there are end users with compliance requirements that may preclude use of software with known vulnerabilities, even in test dependencies. Based on my own experience with compliance audits it may not be problematic for the majority of end users, but I don't think we can authoritatively say the current state is not broken for some.
As for the patch itself, if we're already dealing with POM files as build artifacts, using a couple of well-defined POM files to generate others doesn't feel all that clunky if it means removing a known unsupported library with open CVEs. Or at least, no more clunky than the rest of the ANT-based build system. That's not a dig at ANT, just an admission that a sufficiently complex project will generally have a similar complexity in its build system. If we were using Maven, Gradle, or whatever else there would still be a lot of complex custom code to handle some of the existing build logic, it would just be in Maven XML or Groovy/Java/Kotlin.
In regards to alternate formats, I've used Ivy in the past but only via SBT. Are there tools that we can use to generate the necessary POMs for publishing from an Ivy input? Are there any differences in Ivy behavior vs Maven's resolver that could be problematic?
about future of MAARAT https://issues.apache.org/jira/browse/MRESOLVER-192 Remove support for reading POMs
for me, this PR is good, even as an intermediate step to fully move to Maven because it opens the possibility to clean up dependencies with tools like Dependabot (be prepared to even more CVEs from depedencies). The next step could be to add poms to test modules - that could also limit the set of used dependencies in the main module.
Based on my own experience with compliance audits it may not be problematic for the majority of end users, but I don't think we can authoritatively say the current state is not broken for some.
This is also my experience, but we do have .build/dependency-check-suppressions.xml, which compliance audits should (hopefully) consider in their evaluation.
about future of MAARAT https://issues.apache.org/jira/browse/MRESOLVER-192 Remove support for reading POMs
Thanks for mentioning this @slachiewicz -- it's useful to know that this is coming in the future. Since there's no clear timeline, and it's part of a major version bump, I don't consider this a big problem for this ticket.
Regarding @dchenbecker's questions about Ivy - I'm not an Ivy expert, and the reasons for migrating from Ivy in 2011 (CASSANDRA-2017) may or may not still be valid. Ivy would be a more significant change (would need to reconfigure Maven scopes to Ivy configurations, assemble compatible POM generation and publishing, understand how to audit transitive dependencies, and that's just off the top of my head). The approach here is closer to what we're used to, and doesn't prevent any improvements in the future.
it opens the possibility to clean up dependencies with tools like Dependabot (be prepared to even more CVEs from dependencies)
Dependency (software supply chain security) analysis against the difference choices should be investigated. Our current manual auditing and use of mvn dependency:tree for SBOM isn't up to scratch post log4shell.
The following is also being made explicit for the next revision of our latest code style guide
Adding new dependencies requires community consensus via a [DISCUSS] thread on the [email protected] mailing list.
https://docs.google.com/document/d/1sjw0crb0clQin2tMgZLt_ob4hYfLJYaU4lRX722htTo/edit
Given all discussion I am in favor of moving forward with this patch. I recommend we leave the 3 poms and can tackle improving into a different ticket (and maybe scope only to trunk)
Our current manual auditing and use of mvn dependency:tree for SBOM isn't up to scratch post log4shell.
This patch doesn't materially change any factors with respect to supply chain analysis. I'm open to more discussion on how we can improve, but that's outside the scope of this issue.
@michaelsembwever I'm looking to resolve remaining blockers and move future items into their own ticket.
Blockers for this PR:
- [x] Rename POMs to clarify that they are used by Ant, and not Maven directly
- [x] Clean up POMs based on @slachiewicz's feedback
For a separate ticket:
- Evaluate build systems with better dependency audit functionality (what is required here?)
Given all discussion I am in favor of moving forward with this patch.
For me, so long as any of the unanswered questions are ok to be left as so. Ideally I would leave the dev ML open for a little longer.
I was planning to apply to trunk, 3.0, 3.11, 4.0, 4.1 - as a security fix, back-porting is necessary. Is this right?
We need patches against all branches. We start with 3.0 (and forward merge, not back-port)
@michaelsembwever I'm happy to leave the ML thread open over the weekend, and in the meantime I can work on preparing patches for active branches.
@aratno I can create a fork for MAT in one of our Maven's team repo (mojohaus) with updated plexus lib Please check the version here: https://oss.sonatype.org/content/repositories/snapshots/org/codehaus/mojo/maven-ant-tasks/2.2.0-SNAPSHOT/ I've updated the version to 2.2.0 because the new plexus-utils requires Java 7
-- build.xml
@slachiewicz we've had some discussion about maintaining a fork:
I am strongly against maintaining a patched jar for a project we do not maintain... that burden is much higher than working with pom files for dependencies in my option.
If we're going to depend on a fork maintained by someone else, @michaelsembwever's comment about adding dependencies requiring community consensus applies:
The following is also being made explicit for the next revision of our latest code style guide
Adding new dependencies requires community consensus via a [DISCUSS] thread on the [email protected] mailing list.
I wouldn't want Cassandra to be the only project depending on the fork, since we benefit from "safety in numbers" when it comes to dependencies. If MAT has been retired for nearly a decade, then I assume most other projects have either migrated away or are not maintained themselves.
We need patches against all branches. We start with 3.0 (and forward merge, not back-port)
3.0, 3.11 , 4.0 will need to honour the two existing published POMs.
In 4.1 and trunk we can look at publishing only one unified POM.
Need to confirm install and publish are updated correctly
Please also test that the jar files placed into lib/, build/lib/jars/, and build/test/lib/jars/ remain the same.