remove unsafe use of maven dependency caching, and cache deps explicitly
Fix unsafe usage of maven dependency caching in actions/setup-java,
and replace with explicit save/restore of maven dependencies only without relying on mvn install.
Using the the setup-java maven cache is not safe for steps calling mvn install,
since setup-java cache keys are only based on the pom hashfiles whereas
mvn install causes artifacts specific to that commit to end up into the local maven repository.
This disables the use of the setup-java maven cache for all steps.
This also removes references to polluted setup-java cache keys, which we were referencing as fall-back
cache using restore-keys: in some integration and unit tests.
Instead, to avoid pulling dependencies all the time, maven dependencies are first resolved explicitly
using maven compile / test-compile, and then cached/restored explicitly or used as fallback for
cache misses with explicit commit tags.
@xvrl - The static checks are failing.
could you please reference a failed build which have failed because of that caching?
I have experienced this locally where I had artifacts from a different build causing integration tests to fail. This change removes that possibility. See also further below in my comment for an example that happened as part of this change.
since the cache is labeled by the pom hashes I don't think its showstopper to have installed artifacts from the project being built
The pom hash is not sufficient, different code could have the same pom hash, causing incorrect artifacts in the maven repository from being used at runtime. Some tests may also accidentally pass due to artifacts being present instead of failing if they are not rebuilt.
Only those cases where we explicitly want to reuse artifacts from a previous build step for same commit should be restoring maven install artifacts. Those are already marked as such https://github.com/apache/druid/blob/master/.github/workflows/unit-and-integration-tests-unified.yml#L72-L80
my problem with using random maven commands to download artifacts is that they don't always do everything - for example: druid has some extra maven plugin stuff which downloads a nodejs.tgz into the local m2 repo
fair, although the goal of caching is to download as many dependencies as possible without affecting the integrity of the build. If we miss a few it should not be a dealbreaker. I'm trying to achieve this with as few commands a possible. We may need to resolve additional dependencies explicitly if they are only called during some specific phases of the build.
It looks like mvn dependency:resolve trips up trying to download druid artifacts in submodules so we may need a different approach than what I tried regardless. This is actually a good example of where it worked locally because my maven repo had druid artifacts from a previous mvn install, but failed in CI without those artifacts. Our default maven cache should only ever contain artifacts already in maven central.
@xvrl - The static checks are failing.
@abhishekagarwal87 yes, see my comment about bout dependency:resolve failing on druid artifacts, I'll have to look for an alternative way to pull maven dependencies.
could you please reference a failed build which have failed because of that caching?
I have experienced this locally where I had artifacts from a different build causing integration tests to fail. This change removes that possibility. See also further below in my comment for an example that happened as part of this change.
I don't understand how these caches affect your local builds - if you can't point to a build which have failed/misbehaved then I don't really understand why do we need to change this.
those artifacts could be there and cached and reloaded - but they will not affect things; as they will get overwritten by the local build because incremental compile could not kick in for a fresh checkout - so for the install target it will need to compile and create the artifacats from 0.
[...] the goal of caching is to download as many dependencies as possible without affecting the integrity [...]
I don't think it should be a best-effort thing...we should try to avoid as external (re)downloads as possible... let's say you miss 1 artifact; and you cache stuff and fan out to a 100 jobs; that will be cause 100 hits on that artifact
I think as an alternative we could probably purge the local druid artifacts (remove ~/.m2/repository/org/apache/druid/ ?) at the end of the job so that they are not placed in the caches
The pom hash is not sufficient, different code could have the same pom hash, causing incorrect artifacts in the maven repository from being used at runtime. Some tests may also accidentally pass due to artifacts being present instead of failing if they are not rebuilt.
That depends on the usecase:
- If it just builds the project from zero it should work - since the actual code should not affect the referenced dependencies.
- if its used during testing to avoid the rebuild of the project; like here then yes - all files of the project should be accounted for - as it will try to evade the build phase
Could you describe the scenario you have in mind?
Could you describe the scenario you have in mind?
Any phase where we restore the maven repository without explicitly restoring from a specific git commit sha should be safe to assume that the maven repository will only contain publicly available dependencies.
We cannot assume that every phase will always call install first, since that is already not the case today, e.g. our web console step does not https://github.com/apache/druid/blob/master/.github/workflows/static-checks.yml#L173-L177
The cache might be deleted for any reason, and in cases where we fall back to using the setup-java maven cache such as here https://github.com/apache/druid/blob/master/.github/workflows/unit-and-integration-tests-unified.yml#L80 it's possible the maven repo would contain artifacts that do not get built again.
For example, if a PR removes a submodule, but some code depending on that submodule still exists, it could pass if the cache contains that artifact, but fail if the cache does not.
@kgyrtkirk to answer your question in the other thread
I think instead we should change install to package so that mvn doesn't pollute the local .m2 caches - or that doesn't work?
package works in some cases, e.g. unit tests, but for integration tests the artifacts need to be in the local maven repo.
We cannot assume that every phase will always call install first, since that is already not the case today, e.g. our web console step does not https://github.com/apache/druid/blob/master/.github/workflows/static-checks.yml#L173-L177
that's pretty interesting :D a cache should never be taken for granted... for the record: it does run the tests with 0 downloaded artifacts...so:
- its either working incorrectly even in that case
- or doesn't care at all for what's in the cache
The cache might be deleted for any reason, and in cases where we fall back to using the setup-java maven cache such as here https://github.com/apache/druid/blob/master/.github/workflows/unit-and-integration-tests-unified.yml#L80 it's possible the maven repo would contain artifacts that do not get built again.
For example, if a PR removes a submodule, but some code depending on that submodule still exists, it could pass if the cache contains that artifact, but fail if the cache does not.
I agree that we should prefer to not keep artifacts produced from the current sources in the cache
I think that we would need to cache more than what the proposed PR tries (include the node stuff/etc) and possibly remove the attempts to avoid compilation in some jobs - because they use up too much cache space and cause churn. So instead of some random maven commands which supposed to work; consider:
- somehow configure maven to install artifacts into a secondary location (and also look them up there)
- last time I checked this was impossible ; didn't see it possible right now
- configure the cache to exclude a set of directories during save
- seems like
pathis glob; so it could possibly configured to include+exclude (didn't tried it yet) - this will need an extra action in every job to maintain the cache
- seems like
- purge the druid artifacts from the
.m2at the end (before save)- this could be compatible with the
setup-javawe are using; so it could be less intrusive - failing to add this step will pollute the cache with artifacts from the project...
- this could be compatible with the
- change the project to not use maven local
- not sure if this is possible...
I'm right now biased toward the configure the cache to not include stuff direction; for a couple reasons:
- it may not pick up garbage
- less likely to make it suck
some notes on current state:
- a standard build produces
1.2G+2G+1.5G = 4.7Gof "cache stuff" - it takes 2:30 to build the docker image ; and 30sec to restore it
- with some effort we could probably also cache the baseimage (possibly put it into the main cached image)
- I think this should be done on-demand; instead of utilizing a cache https://github.com/apache/druid/actions/runs/6568490107/job/17842978452?pr=15106 https://github.com/apache/druid/actions/runs/6568490107/job/17844540576?pr=15106
Any idea if this patch would help with https://github.com/apache/druid/pull/15276#issuecomment-1786003447? I just noticed some weird behavior where an IT repeatedly fails on an npm step even after multiple attempts to re-run the failed step. It might be some caching issue?
@gianm I don't think so, I would imagine the npm caching is separate, and we don't restore any of that today. This might be a different issue.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.
@kgyrtkirk I no longer have time to work on this, so if you or someone else would like to push it over the hill feel free. In the meantime I will close it.