mill icon indicating copy to clipboard operation
mill copied to clipboard

Use Mill's BOM support in Mill's own build

Open lihaoyi opened this issue 8 months ago • 18 comments

As part of https://github.com/com-lihaoyi/mill/issues/4872, I hit a ton of issues due to duplicate jars on the classpath, where incidental changes in the classpath ordering resulted in obscure hard-to-debug issues when a different version of a third-party dependency was picked. We should make use of Mill's new BOM support to enforce the same dependency versions across the Mill project where possible,

Separately, we should also try and fix the issues where we had multiple incompatible jars on the same classpath. This seems to happen the most in .local mode integration tests where the mill/local-test-overrides files would have the transitive classpath for each module separately resolved and listed out, resulting in conflicting jars when the multiple local-test-overrides are included in the same classpath. I'm not sure what can be done here, but maybe Coursier could be made to treat the local-test-overrides as proper maven modules, rather than pre-resolved lists of jar files on disk

lihaoyi avatar Apr 07 '25 15:04 lihaoyi

CC @alexarchambault do you think you could pick this up? Seems like you would be the person with the most expertise here

lihaoyi avatar Apr 07 '25 15:04 lihaoyi

Sure, I can look into that. I might try something different than using a BOM for that, like preventing the direct dependencies of Mill to be bumped.

alexarchambault avatar Apr 07 '25 16:04 alexarchambault

Why wouldn't we use a BOM though? It seems like this kind of project-wide dependency version consistency is exactly what BOMs are meant to provide

lihaoyi avatar Apr 07 '25 16:04 lihaoyi

Just using a BOM might solve the issues you ran into, but it might cause other issues. A BOM blindly forces versions. So it might downgrade some dependencies, which can create runtime issues too (missing new classes or methods).

A BOM is somehow a substitute to an object Deps (it centralizes versions). And it also forces versions, which can cause runtime issues in its own way.

To really solve the problem you're seeing, using a BOM or not, I think we should detect when several versions of a dependency are depended on (Mill depends on one, and a dependency depends on another for example). It's a problem that was a bit worked on in sbt, with its evicted task.

We can use a BOM (to eat our own dog food say), but that wouldn't be enough.

alexarchambault avatar Apr 07 '25 17:04 alexarchambault

I think the first step should be detecting conflicting dependencies. Instead of a BOM, we should start adding version constraints (e.g. all versions compatible to our own API) in addition to concrete versions. Don't know it this is possible or if it is either version or constraint. But as soon as I add the contraint on a different level (e.g. a (BOM?) dependency), it should be possibily to get the preferred as well as a constraint of all possible versions on the same tree.

lefou avatar Apr 07 '25 19:04 lefou

If we use a BOM containing all of Mill's direct dependencies, will it resolve the transitive dependencies and then come up with a single consistent version for each artifact? Or do we need to list out transitive dependencies as well?

I think the problem I'm seeing can be solved as long as each dependency has a single version, presumably the highest transitively-depended-on version since that's what Mill normally does. e.g. I don't want some module to pull in an old version of com.lihaoyi::sourcecode transitively through some lib, while another module depends on a newer version of com.lihaoyi::sourcecode directly through ivyDeps. I'd like all the dependencies used in Mill to be resolved to a single version, and that single version used throughout (excluding things like workers which are by definition on separate classpaths)

lihaoyi avatar Apr 09 '25 12:04 lihaoyi

As far as I understand it, there are roughly three ways to specify a version:

A dependency with

  1. a simple version
  2. no version
  3. a version range

Each has it's own meanings and downsides.

  1. A version is what we're all used to. If coursier find multiple versions, it needs to resolve the conflict, a process called reconciliation. Effectively it means, you might end up with a different version in the tree. In that sense, this is an "unforced" version. In contrast to Maven, coursier will pick the newest version between those requested.

  2. If a dependency has no version, a version needs to be given in a dependencyManagement section, which can be also distributed via BOMs. If I understood @alexarchambault correctly, these versions then are also enforced, but I don't know exactly what that means for all possible combinations. Maybe it means, this version is picked, even when a version conflict contains higher versions?

  3. A version range or version constraint is technically the best way to specify a dependency, since you can have open and closed intervals. Also, coursier is able to properly error out if no version could be reconciled, which matches all present version constraints. The downside of a version range AFAIU is, that you can not pick your preferred version as part of the contraint and there is no specified lockfile mechanism. So it is nondeterministic. Old Mavens (2.x) had great trouble to handle projects with version ranges and that's probably the reason why nobody used ranges (Probably because there was no reliable way to query a repository which versions exits. I think coursier has found some heuristics to infer that info). But if you know some dependency is already in the tree, it should be possible to add a version constraint and you won't end up with a version outside that constraint.

What is unclear to me is, what @alexarchambault means with "And it also forces versions, which can cause runtime issues.".

lefou avatar Apr 09 '25 15:04 lefou

So the main question is: Does a BOM version always wins, or is it only used to fill dependencies with an empty version? Will coursier still pick the newest verison in a conflict, if one lower version was provided via a (BOM) managed dependency?

lefou avatar Apr 09 '25 16:04 lefou

This is the only ticket directly addressing mill’s bom support, so, I hope my comment is not out of place. After reading the docs, it’s unclear why one may need depManagement since everything it does can apparently be done in ivyDeps.

asarkar avatar Apr 23 '25 09:04 asarkar

This is the only ticket directly addressing mill’s bom support, so, I hope my comment is not out of place. After reading the docs, it’s unclear why one may need depManagement since everything it does can apparently be done in ivyDeps.

You are right. But we also have to consider the larger ecosystem, where not every project is build with Mill and where large frameworks (like Spring or Vaadin) heavily uses dependency management and BOMs to somehow deal with the complexity in an XML-based Maven world. We mostly need dependency management to interact with that world, either while consuming such frameworks or while producing artifacts for users that may need to manage them that way.

lefou avatar Apr 23 '25 11:04 lefou

@lefou It’s still unclear what you mean, could the docs be improved to demonstrate the usefulness (not usage) of depManagement?

asarkar avatar Apr 23 '25 12:04 asarkar

I'm pretty sure docs can always improved. Should we motivate it? I'm not so sure. We now have it in Mill for those users who need it, but I'm not sure if Mill itself benefits from the use of dependency management. We need to find out, e.g. if it can ease writing plugins. It all depends on a good understanding how BOM-managed dependencies are handled downstream by coursier. That's my open question from comment https://github.com/com-lihaoyi/mill/issues/4883#issuecomment-2790183101, which hopefully @alexarchambault can answer.

lefou avatar Apr 23 '25 20:04 lefou

So the main question is: Does a BOM version always wins, or is it only used to fill dependencies with an empty version?

The BOM version:

  • fills the empty versions in the module that adds the BOM (so if a direct dependency version isn't empty, it wins over the BOM one)
  • overrides versions of transitive dependencies - in that case, the BOM version always wins

Support for the second case is somewhat recent (roughtly since https://github.com/coursier/coursier/pull/3097 / coursier 2.1.17)

Will coursier still pick the newest verison in a conflict, if one lower version was provided via a (BOM) managed dependency?

IIUC that corresponds to the second case, so the BOM version takes over

alexarchambault avatar Apr 24 '25 16:04 alexarchambault

As far as I understand it, there are roughly three ways to specify a version:

A dependency with

  1. a simple version
  2. no version
  3. a version range

Each has it's own meanings and downsides.

  1. A version is what we're all used to. If coursier find multiple versions, it needs to resolve the conflict, a process called reconciliation. Effectively it means, you might end up with a different version in the tree. In that sense, this is an "unforced" version. In contrast to Maven, coursier will pick the newest version between those requested.
  2. If a dependency has no version, a version needs to be given in a dependencyManagement section, which can be also distributed via BOMs. If I understood @alexarchambault correctly, these versions then are also enforced, but I don't know exactly what that means for all possible combinations. Maybe it means, this version is picked, even when a version conflict contains higher versions?

That's right, whatever version some transitive dependency brings, the BOM one takes over it

  1. A version range or version constraint is technically the best way to specify a dependency, since you can have open and closed intervals. Also, coursier is able to properly error out if no version could be reconciled, which matches all present version constraints. The downside of a version range AFAIU is, that you can not pick your preferred version as part of the contraint and there is no specified lockfile mechanism. So it is nondeterministic. Old Mavens (2.x) had great trouble to handle projects with version ranges and that's probably the reason why nobody used ranges (Probably because there was no reliable way to query a repository which versions exits. I think coursier has found some heuristics to infer that info). But if you know some dependency is already in the tree, it should be possible to add a version constraint and you won't end up with a version outside that constraint.

About version listings, there are maven-metadata.xml files. These were already around when I started working on coursier.

What is unclear to me is, what @alexarchambault means with "And it also forces versions, which can cause runtime issues.".

I'm thinking of a case where we depend on version 1.2.0 of a dependency, and another dependency pulls version 1.2.5 of it. If we use a BOM for 1.2.0, then it's not only going to fill empty versions of ours, but it will also override the 1.2.5. That can be a problem for a backward compatible library, given 1.2.5 might have newer classes or methods, that the dependency that tries to pull 1.2.5 might rely on. In that case, if we force 1.2.0, we'll get class-not-found / no-such-method exceptions at runtime.

alexarchambault avatar Apr 24 '25 17:04 alexarchambault

I'm thinking of a case where we depend on version 1.2.0 of a dependency, and another dependency pulls version 1.2.5 of it. If we use a BOM for 1.2.0, then it's not only going to fill empty versions of ours, but it will also override the 1.2.5. That can be a problem for a backward compatible library, given 1.2.5 might have newer classes or methods, that the dependency that tries to pull 1.2.5 might rely on. In that case, if we force 1.2.0, we'll get class-not-found / no-such-method exceptions at runtime.

Oh, that is indeed bad. It is also not clear to me why we have to force the version over already defined versions. Wouldn't it be better to treat every explicit version as an right open interval? I understand that resolving such scenarios automatically will sometimes never reach a satisfy-all resolution, but I think detecting/reporting (even failing) such situations is a must.

There was a recent announcement in the Gradle channels, that they have added some way to constrain transitive versions without adding that dependency directly (I'll search for a link later, edit: is doesn't seem so new, link: https://www.raphael.li/blog/2025/04/handling-vulnerable-transitive-dependencies-gradle/). That would be something I really want for Mill too. I even tried to simulate something similar before: https://github.com/com-lihaoyi/mill/blob/f7c380265ad64915a4761e5dfa732b92c9c98ecb/build.mill#L462-L475 (Whats missing in my attempt is the detection of a downgrade.)

lefou avatar Apr 25 '25 07:04 lefou

The equivalent of BOMs for Gradle Modules, "platforms", might allow what you describe. The documentation is unclear, but Gradle Module platforms, such as this one, contains version constrains with "requires", which is supposed to be more loose than "strictly". Outside of BOMs / platforms, "requires" is meant to be interpreted as a right open interval.

alexarchambault avatar Apr 25 '25 13:04 alexarchambault

I think coursier still interprets those versions in platforms as strict versions, but this could be relaxed for "requires" ones.

alexarchambault avatar Apr 25 '25 13:04 alexarchambault

Experimenting with this in https://github.com/coursier/coursier/pull/3393. It seems there's no version downgrade because of platforms in the coursier tests. Which means this leaves the door open to a different handling for when a version downgrade would have happened.

alexarchambault avatar Apr 25 '25 15:04 alexarchambault