maven icon indicating copy to clipboard operation
maven copied to clipboard

MNG-5899 Reactor should use reduced dependency pom

Open trask opened this issue 8 years ago • 15 comments

trask avatar Oct 03 '15 00:10 trask

-1

The old behaviour allowed inconsistency between dependencies used to calculate project build order and project dependencies used during the build. It also resulted in reparsing reactor project pom.xml files multiple times during the build, which affected build performance for larger projects.

ifedorenko avatar Oct 06 '15 02:10 ifedorenko

@ifedorenko, thanks for reviewing this. Any thoughts what we should do with the downstream implication of this change? https://issues.apache.org/jira/browse/MSHADE-206

trask avatar Oct 06 '15 02:10 trask

I am not sure I fully understand the problem, but maven generally expects project dependencies to stay the same during the build. If you need to suppress certain storm-core dependencies from "leaking" into uber-jar projects, I think you should be able to simple mark those dependencies as optional=true (which really means "non-transitive").

ifedorenko avatar Oct 06 '15 02:10 ifedorenko

The nice thing about how maven-shade-plugin's "dependency reduced pom" used to work, is that you could run mvn compile or mvn test from your parent module (and so maven shade plugin wouldn't kick in since it is bound to package phase), and transitive dependencies would work as normal. And you could run mvn package or mvn install from your parent module, and maven shade plugin would kick in and expose the shaded jar and dependency reduced pom to downstream modules.

trask avatar Oct 06 '15 03:10 trask

@ifedorenko Without this kind of functionality using the shade plugin with multi-module projects is a huge pain, and would require us to use an external build system to coordinate the maven builds. You have to fully build/install each shaded package independent of all other packages that depend on it.

I understand that it is ugly to have the dependencies change as the build progresses. I agree it really confuses things to run dependency:tree and see it be completely different from what is actually pulled in. But it is not as bad as getting the wrong uber-jar that is missing things it needs.

revans2 avatar Oct 06 '15 13:10 revans2

In addition to the problems mentioned above, that hack also breaks the transitive dependency collection functionality in general, because it employs a Reactor-wide cache of parsed POMs that fails to take into account the RepositorySystemSession contexts in which the cached POMs were parsed.

A cache with a broken key scheme is not a cache, it's a source of unexpected behavior: if any RepositorySystemSession is at play other than the one used by the Reactor, collecting transitive dependencies now results in a dependency tree that is practically random, as it depends on:

  • what POMs have already been parsed by the Reactor during the build (this is why splitting the build into multiple phases works around the problems caused by this bug),
  • what other RepositorySystemSession instances have been used to parse and cache POMs so far,
  • what command line arguments were used to invoke Maven.

I have a Maven plugin that computes transitive dependencies of the host project by cloning the RepositorySystemSession used by the reactor to set properties that activate build profiles, which in turn contain dependencies to be taken into account when computing a custom dependency tree.

The API to do so is public so one expects the functionality work, which it did until Maven 3.3 came along. Now because of that bug, the functionality stopped working reliably.

Speed is important, sure, but correct functionality is probably even more so. So please remove that hack, and if its function is indeed necessary, then it needs to be replaced with a proper cache, one that takes the active RepositorySystemSession into account.

aqueance avatar May 03 '16 16:05 aqueance

Why the hack removal has not been merged yet?

madisparn avatar Apr 26 '17 19:04 madisparn

I know @jvanzyl claimed he had an alternative solution that allowed the caching but I have not seen that in over a year.

I am inclined to agree with removing this hack given the side-effects on existing use-cases.

The goal of a read-only model built once is noble but I fear not something that this hack implements effectively... at least without seeing evidence showing the benefits of the "hack"

stephenc avatar Apr 26 '17 20:04 stephenc

I still vote -1 on this change.

While I appreciate "dependency reduced pom" usecase, I'd like the following two concerns addressed first:

  • Mutated pom.xml files must not invalidate original reactor ProjectDependencyGraph. More specifically, if the original graph allowed certain build order, the new graph must still allow the same order. In practice this means the new graph must not have any new dependencies, which is rather tricky to guarantee when we consider dependency <excludes> and dependency management.

  • Implementation must scale well to 5K+ modules and 5K managed external dependencies. In practice this requires some sort of caching to avoid repeated reparsing/reinterpollation of reactor pom.xml files.

Additionally, I'd like to understand expected behaviour when projects with dependency reduced pom files are excluded from the build using --projects command line argument or pom-reducing mojo is not part of selected build phase. I have not analyzed this in details, but I believe mutable pom.xml files can lead to odd/unexpected build results that will be difficult to explain and hence require extra support effort (I happen to help maintain builds for a reasonably large developer community, so supportability is an important concern for me).

ifedorenko avatar Apr 27 '17 15:04 ifedorenko

Implementation must scale well to 5K+ modules and 5K managed external dependencies.

Are you serious? Such a project will never run, because of transitive dependency version conflicts -- precisely that issue we're all trying to use the shade plugin to work around. It's admirable to want your project to "scale', but let's be honest, a single project with more than a fews 10's of modules is already a monolith that should be decomposed.

Can you elaborate on such a use-case? I'm really quite surprised to hear of this as an explicit objective.

ndimiduk avatar Apr 27 '17 16:04 ndimiduk

I am dead serious. I am not at liberty to disclose exact numbers, but lets say our main codebase is in the same ballpark if we allow some room for growth. So 5K modules is not "aspirational pipe dream" kind of goal but very much a real-life requirement we have. (if you are interested, I can provide some details about how we use Maven for a project of this size, but maven dev list is probably a better place to talk about it).

Even in opensource some projects apparently have ~700 modules.

ifedorenko avatar Apr 27 '17 17:04 ifedorenko

Mutated pom.xml files must not invalidate original reactor ProjectDependencyGraph. More specifically, if the original graph allowed certain build order, the new graph must still allow the same order.

That might be a practical limitation right now, but I wouldn't mind having a dynamic build order. The two things that should matter are that builds complete, and have the same final outcome. Dealing with the issues of being able "pull up" a dependency in the reactor, and knowing what can be built / is waiting on something else to be built might actually benefit scalability with parallel executions.

Seems like there is a more important design question here - should a project, when built and installed to a repository, then depended on by a completely separate build behave the same when it is included in a reactor?

If you can create an artifact, and a custom pom for install / deployment to a repository that differs from the project pom, then to my mind that should be what is seen by any module including it as a dependency, even in the same reactor.

The concern is about adding new dependencies, and whilst that is technically possible, I'm not sure that it needs to be supported - it could have just been made a dependency of the project anyway.

The real issue - particularly with the shade plugin - is that you want to embed things in an artifact, and not have other projects having to try and pull them in as dependencies. To be honest, it would actually be better if this was native to the pom, rather than part of the shade plugin, because then you could express what dependencies you want to embed, and this information would then be communicated to other projects depending on it. Then they would not only not pull in the transitive dependency for the embedded code, they would also be able to determine if the embedded version is compatible with their requirements.

grahamtriggs avatar Apr 27 '17 17:04 grahamtriggs

"Richard Sand" on [email protected] replies: If the shaded artifact is built as a submodule with a different artifact= =20 ID, then the shade plugin can specify its DRP as its main POM. It lets=20 other projects depend on the shaded artifact with only its remaining=20 (unshaded) dependencies.

I had tried to use the shade plugin along with the install & deploy=20 plugins so that I could build the main project, shaded artifact, and DRP= =20 within a single POM, but was told that such practice was contrary to the= =20 maven philosophy of all artifacts produced by a given POM have that POM=20 as its parent. So moving shade into a submodule with settings like this=20 was the only solution I could find that works:

<createDependencyReducedPom>true</createDependencyReducedPom> =20 <keepDependenciesWithProvidedScope>false</keepDependenciesWithProvidedScope= <shadedArtifactAttached>false</shadedArtifactAttached>

Not sure if this adds anything to the thread but it worked for me

-Richard

------ Original Message ------ From: "grahamtriggs" [email protected] To: [email protected] Sent: 4/27/2017 1:30:24 PM Subject: [GitHub] maven issue #69: MNG-5899 Reactor should use reduced=20 dependency pom =20 =20 =20 =20 =20 =20 =20 =20

asfbot avatar Apr 30 '17 15:04 asfbot

If the shaded artifact is built as a submodule with a different artifact ID, then the shade plugin can specify its DRP as its main POM. It lets other projects depend on the shaded artifact with only its remaining (unshaded) dependencies.

There are issues with that:

  1. There can be issues with a clean build with an empty repository for some reactors (because there is no information to let maven know the shadedArtifactId will be produced by artifactId so in some cases Maven will fail the build early because it "knows" there is no such artifact. Typically this happens in larger reactors where the build sequence can be more "unpredictable" and Maven decides that there is a "short-cut"

  2. There can be issues with a build from a non-empty repository, especially when using -rf, as you can end up using the -SNAPSHOT from a previous execution (which may retain bugs).

My original proposal for a solution was to allow a plugin to advertise dependency reduction of a pom (or additional GAV production).

For dependency reduction, Maven would enforce specific rules such as "no new dependencies" and the build plan would be unmodified.

For additional GAV production, there would have to be strict rules on the new GAV (such as being evaluated based entirely on properties available to general project GAV computation - i.e. nothing injected into the build by another plugin) and similar rules on the new GAV pom (i.e. only ever a subset of the GAV it is being generated from)

That would allow us to create the build plan reflecting the potential behaviours of the shade plugin (as well as the flatten plugin) before any plugin executions but allow plugins to do what is necessary... but my suggestion was shot down last time

stephenc avatar Apr 30 '17 19:04 stephenc

Is there any update on this topic?

repolevedavaj avatar Jul 02 '19 16:07 repolevedavaj