setup-ocaml
setup-ocaml copied to clipboard
Remove non-essential parameters from cache keys
This partially solves #528. The non-essential parameters should not be part of cache keys. The problem of the current design is that it'd store many copies of same things but then never use the older copies unless you specifically rerun some workflow within 7 days. For example, if one is using dune-cache
, every (re)run will upload a new copy with a different key. I don't feel this is the desired behavior. We should just overwrite older copies with the latest one. When deciding what is essential and what is not, I believe repositories are uniquely determined by the URLs, and thus I removed the OCaml version from the download cache key as well.
One might argue that keeping older versions of the cache will facilitate recreating the same results when rerunning a workflow within 7 days. I am against this philosophy because the CI result should not depend on whether something is cached or not, and the scripts should always approximate "building from scratch, here and now" as much as possible. The purpose of cache should be only for efficiency. If the scripts would have succeeded now, then they should pass; otherwise, if the scripts would have failed, then they should fail when I rerun it.
By the way, this could probably also paves the way to cache the entire .opam
if people want to, because we will have much more space left. Building the package z3
, for example, usually takes forever on GitHub in my experience, and a full cache might be worth it.
~~(Let me make this a draft before fixing the yarn build issue.)~~ It's fixed now.
For xen-api I set up a daily opam build cache, which I warm up with a scheduled run in the morning.
Invalidating parts of a cache is a difficult problem, having an expiry dates limit the duration of the problems caused by it (for us they only last a day).
Could you try this solution instead before committing to the solution proposed by this PR?
Could you try this solution instead before committing to the solution proposed by this PR?
Thanks. I actually already "fixed" my immediate problem by using a different prefix to invalidate all cache. To clarify, this PR is not (directly) about invalidating cache, but about reducing duplicates of active copies. Having duplicates of stale cache would make invalidating all of them more difficult, but that is not the main point of this PR. Reducing duplicate copies in my opinion has its own merits even if we are not invalidating them. I wonder if you have comments on the principles or correctness of this PR?
I don't understand the current design choice to save several copies either. But I think removing the date from the cache keys is a very risky decision that should be decoupled from the removal of the duplication, for the reason stated above.
I don't understand the current design choice to save several copies either. But I think removing the date from the cache keys is a very risky decision that should be decoupled from the removal of the duplication, for the reason stated above.
I don't think having dates in the keys will help your situation if I read your CI script correctly, because you always try to access the most recent copy by calling /bin/date
. The current design of ocaml-setup will use an older copy if date fields do not match. In other words, having a cache labeled "yesterday" does not prevent it from being used "today", so there's actually no protection. You are probably advocating for skipping cache that does not have matching date keys, but that's not the current design. Moreover, there will be two cached entries after running the CI "today", one labeled "yesterday" (which is practically inaccessible) and one labeled "today". This PR is to overwrite the practically inaccessible copy by removing the date fields, so that the "today" entry will overwrite the "yesterday" entry directly. This should be safe unless you really want to use the "yesterday" entry.
Moreover, there will be two cached entries after running the CI "today", one labeled "yesterday" (which is practically inaccessible) and one labeled "today".
That's what xen-api uses and it's what I'm proposing.
This PR is to overwrite the practically inaccessible copy by removing the date fields, so that the "today" entry will overwrite the "yesterday" entry directly. This should be safe unless you really want to use the "yesterday" entry.
I understand but I disagree on the second part: I don't think it's safe, it causes problems when loading caches that are not in sync with what the newest opam metadata expects, at least this was the case in xen-api and why I opted to make "yesterday's" cache inacccessible.
@psafont I guess I am a bit confused:
- Before this PR: there will be multiple copies labeled with different dates, and you will load the latest copy if the "today" copy does not exist.
- After this PR: there will be only one copy. It is impossible to load any cache that is not the latest.
Observationally, there should be no difference, other than that the approach 1 is keeping more copies. If "it causes problems when loading caches that are not in sync with what the newest opam metadata expects", then it would already be causing problems now. You could load outdated cache now if and only if you could load outdated cache after the PR. Therefore, I believe your concerns are orthogonal to this PR---this PR is not helpful or hurtful in your case.
If "it causes problems when loading caches that are not in sync with what the newest opam metadata expects", then it would already be causing problems now.
It is... it is why I use dates in the cache keys in xen-api.
Therefore, I believe your concerns are orthogonal to this PR---this PR is not helpful or hurtful in your case.
No they are not, this is because Github doesn't provide an easy way to invalidate / clear caches. Removing the time limit on the cache, will mean a workflow will continue to use the broken cache until the compiler or the opam version, which are not usual events.
Removing the time limit on the cache, will mean a workflow will continue to use the broken cache until the compiler or the opam version, which are not usual events.
We might be talking about different things and/or have very different goals. If you meant that setup-ocaml
should implement how you protect yourself, I am not interested in changing the observational behavior of setup-ocaml
in this PR unless the maintainers say so. I thus consider this PR "safe" as long as it is not causing more problems. (The maintainers (e.g., @smorimoto) have not responded to this PR, and already marked another related issue of mine as "invalid", indicating that they think the current design is okay.)
Closing this because (1) I don't have time to keep maintaining this PR and (2) the people in charge don't seem to be interested in changing the current logic and (3) the current program is not incorrect, just inefficient, even if my assessment is accurate. Feel free to recycle it for a new PR.