maven-resolver
maven-resolver copied to clipboard
[MRESOLVER-268] Allow for checksum validation upon artifact resolution.
Artifacts are currently only checksum validated via a ProvidedChecksumsSource if they are downloaded from a remote repository. This disables any checksum validation if another project already downloaded a corrupted artifact without validating a checksum.
This should not be default for two reasons:
- It will apply a significant computational overhead
- Artifacts which have been locally installied don't have checksums at all
ProvidedChecksumsSource was done to be aligned with other checksum stuff (external, inlined and provided) and whole resolver was done in a way that only remote transport checks checksums. Before, it was envisioned to have local repo "nuked" (hence, CI starts from clean slate), and whatever your build up in local repo afterwards is downloaded (will have checksum checked) or was produced during build. Naturally, this implies some sort of artifact cache (repo manager of any kind).
These days OTOH CIs like GH Actions can and do cache local repository, so this change would make sense IMHO.
Still, whole resolver was originally implemented in this spirit (as @michael-o explained above), hence this may/will leave inevitable to breakage, especially if local repo is shared across projects (so, where "suddenly" locally built/installed artifact becomes a dependency).
For me, it "smells" like split repository could come into play, but this is still just a hunch/feeling/faint-idea but unsure how and where.
After reviewing the intent from https://issues.apache.org/jira/browse/MRESOLVER-268 ("to retain the integrity of a project also when sharing a local Maven repository with other, unsecured projects"), I'd call this somewhat a too "edgy" use case, partially due already mentioned fact, that installed artifacts have no checksums installed, only downloaded ones have them. Hence, if two or more (unrelated) projects share same local repository, the checksum-less artifact installed by one may become a "foreign" dependency in another. Also, as this PR does, checksumming (over and over again) ALL resolved files, it is too much overhead IMHO.
Am putting this PR "on hold", to have a discussion about this. Personally, am on side to keep "provided checksums" what they are meant to be: third type of transport checksums. This seems to me like some sort of misuse of them.
As a user, you do not normally have control over the state of your/the build server's local repository. A build server might share some mounted folder with many nodes to reduce traffic, especially in cloud environments this night be implemented to avoid additional traffic. If other builds do not validate the checksums and my build will neither as a consequence, then the principal that any used artifact can be validated based on locally available information is broken. I could still emulate this by using a setting with a fresh repository location, but ideally I would want to treat the repository as an untrusted, remote server, just like a remote repository, only with the convenience that the artifacts are already downloaded.
For this reason I was hoping for an option to evaluate this. If not by default, maybe by setting a system property, as allowing for a zero trust approach is quite handy. Maybe this system property could also allow for generating these checksum files.
My stance with this feature is somewhat aligned with this new feature https://issues.apache.org/jira/browse/MRESOLVER-274 when it comes to local repository. Simply put, if you share your local repository across many (unrelated) builds, you cannot be sure about the state of it (not to mention possible information leakage as well but let's not mix that in). But the "quality" of it may become questionable as well.
For me, approach like Github Actions is the correct: you CAN cache local repo, but that cache is reused only for that very same project, nothing else, is not shared (for obvious reasons as well).
On CI side, reuse of local repository should really be handled per job or job group, as ultimately you have the "nuke it and let MRM serve it up", but yes, it may create nice (hopefully internal, as MRM should be internal) traffic.
Am on edge on this, but this PR is "too much", maybe then some post-resolve hook (component) and one needing it, may implement a component (and use it as build extension) that performs the task you need at the cost of overhead (checksum all resolved artifacts)?
Just an example of "post hook" that would allow you to do this in extension. But that very same extension may "wrap" DefaultArtifactResolver and just do whatever you want (checksum check). Still, implementation should be aware of container (sisu, vanilla Guice or SL)...
https://github.com/apache/maven-resolver/pull/200
Note: I'd go always with sisu, as it is used in Maven as well (SL is being dropped, and also on edge with vanilla guice as well, I mean, if someone HAS guice, then all he needs to do install sisu module and done)
As for the overhead: I use a Maven extension today that does the same thing - that is evaluating the sha256 of each file - and it causes an overhead of about 300 milliseconds on a build time of about 1 minute and 10 seconds. I think this is defensible; assuming that people can choose to not provide checksums.
As for the local repository: the easiest "hack" is to define a custom repository is a settings.xml. But this will trigger a new download on each build, and not everybody can rely on GitHub Actions, even though they get it right.
From a security perspective, the best model is one of zero trust. And the beauty of being able to evaluate checksums upon resolution is that you do not need to trust the build server to be configured correctly. All you need to do is to create a Maven project, and all code that is loaded from outside the project will be evaluated to be legitimate, independent of the build server's setup.
As for making this an extension: this is a bit of a chicken and egg problem. the extension needs to be downloaded, and normally is via Maven Central. If this extension is invalid, the security model is broken. This is why I would want it to be a part of Maven Resolver. If Maven Wrapper is validating the checksums of its downloaded artifacts, the validation chain would be complete and a zero trust model is established for any Maven build. (Gradle offers the same feature.)
Superseded by https://github.com/apache/maven-resolver/pull/200
Superseding PR merged https://github.com/apache/maven-resolver/pull/200 closing this one out.