bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Use Leafwing-Studios/cargo-cache in CI

Open BD103 opened this issue 1 year ago • 14 comments

Objective

  • Our CI caches the ~/.cargo and target folders, to take advantage of faster, incremental, compilation.
  • The use of caches is quite inconsistent, with different formats for keys. It also does not take advantage of cache fallback (using an old cache if the current one does not exist).
  • Caches are being created in the merge queue, but can never be used.
    • To verify this, go to any validation job and check the actions/cache@v4 step. It will always say Cache not found for input keys: ....

Solution

  • Switch to using Leafwing-Studios/cargo-cache, which automatically handles cache keys, fallback, dependency updates, and toolchain updates.
  • Caches are no longer used in the merge queue.
    • Github Actions has something called cache isolation. Long story short, caches cannot be used if they were not created from the same branch or base branch.
    • For instance, a cache created on main can be used by any feature branch, but a cache created by feature-a branch cannot be used by the unrelated feature-b branch.

Changelog

BD103 avatar Apr 20 '24 02:04 BD103

You might be interested in this: https://github.com/Leafwing-Studios/cargo-cache

It uses a the hash of the updated lock file instead of the current date to calculate the hash key, also with several levels of fallbacks.

Honestly, debugging caching actions and figuring out why cargo keeps recompiling dependencies is super annoying, I hope that you'll get it to work!

TimJentzsch avatar Apr 20 '24 10:04 TimJentzsch

You might be interested in this: https://github.com/Leafwing-Studios/cargo-cache

Thank you! I'll definitely take inspiration from that. I chose not to hash the lockfile because I figured that the amount of dependency change in 1 day cannot be so great that it would hurt performance, but that assumption will definitely require testing. It's good to see an alternate approach, though!

BD103 avatar Apr 20 '24 15:04 BD103

So far things are working well. I'm having some issues with the cache not being found, so I may have to move my testing over to my fork, where I can actually delete caches and cancel runs.

Building on MacOS and docs are the two longest jobs. I'm going to see if I can shorten their duration, if possible.

BD103 avatar Apr 20 '24 16:04 BD103

Following @TimJentzsch's interesting link, I've decided to switch this PR to use Leafwing-Studios/cargo-cache for everything. I think it's a lot more maintainable than my custom solution (and it's easy to use, too!).

Performance seems roughly the same, except for run-examples-macos-metal which didn't use caching before. It's hard to tell, since Github has two different MacOS runners with 3 and 4 CPU cores.

BD103 avatar Apr 22 '24 02:04 BD103

Previously attempted in #7627, but the action has also improved since then

TimJentzsch avatar Apr 24 '24 14:04 TimJentzsch

Previously attempted in #7627, but the action has also improved since then

Is there a reason the original PR was closed?

BD103 avatar Apr 24 '24 17:04 BD103

Previously attempted in #7627, but the action has also improved since then

Is there a reason the original PR was closed?

Mainly because it didn't get any more reviews and I was closing out stale PRs that I had still open. I also found it hard to measure the impact of the change, it's hard to debug if a recompile is needed or unnecessary IMO

TimJentzsch avatar Apr 24 '24 17:04 TimJentzsch

But I'm quite confident that the current caching solution is flawed, causing the cache to be outdated quite quickly, and that this PR will be an improvement.

I'm not entirely sure how well the action will work with as many different workflows and branches as we have in Bevy, the maximum cache side could be an issue. But at least in smaller projects (e.g. Emergence) we have used this action successfully and reduced CI times significantly.

TimJentzsch avatar Apr 24 '24 18:04 TimJentzsch

I feel like the risk is acceptable. Leafwing Studios is a trusted third-party, and the argument against using it is the same as any other Github Action we depend on. What if dtolnay/rust-toolchain, Super Linter, Typos, cargo-release, or any other action we use started downloading all tokens and force-pushing to main?

While we cannot guarantee that our dependencies will never run untrusted code, we can ensure that that untrusted code will not be able to do anything harmful by hardening our security and restricting permissions.

This is not an argument to remove those dependencies. I think they provide too much value to our build system to remove them entirely. If security is a worry we can fork the actions, use them instead in CI, and occasionally merge upstream changes.

BD103 avatar Apr 25 '24 13:04 BD103

the maximum cache side could be an issue.

This maximum? Screenshot 2024-04-27 at 10 49 55

We almost never go below 50GB, sometimes over 100GB, GitHub seems... ok with that? That means we never have a cache that is more than one day old.

I think I would prefer an approach where we create cache only from the main branch, those should be available to every job

mockersf avatar Apr 27 '24 08:04 mockersf

I think I would prefer an approach where we create cache only from the main branch, those should be available to every job

What would you think about handling this in a separate issue/PR? I think it could be tackled independently of the way we generate the cache and has its own up- and downsides that we could discuss seperately

TimJentzsch avatar May 02 '24 06:05 TimJentzsch

What would you think about handling this in a separate issue/PR? I think it could be tackled independently of the way we generate the cache and has its own up- and downsides that we could discuss seperately

I agree. Let's get this merged, see the benefits / downsides, then work on an alternative if necessary. I think the cache should be purged directly after this is merged too, just to start fresh.

BD103 avatar May 02 '24 12:05 BD103

If / when this gets merged, it would be good to wipe all of the existing caches here. This will make the migration a little quicker, and hopefully show the performance improvements sooner.

BD103 avatar Jun 25 '24 22:06 BD103

Swapping to Controversial; I want to get Francois's sign-off here.

alice-i-cecile avatar Jul 08 '24 00:07 alice-i-cecile

Update to the newly released v2 before merging.

janhohenheim avatar Jul 17 '24 13:07 janhohenheim

I think I would prefer an approach where we create cache only from the main branch, those should be available to every job

We can now do this via the save-if option, like this:

- uses: Leafwing-Studios/cargo-cache@v2
  with:
    save-if: ${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}

TimJentzsch avatar Jul 20 '24 21:07 TimJentzsch

Done!

@mockersf, could you please review this? You're the last step before this can be merged :)

BD103 avatar Jul 20 '24 21:07 BD103

@mockersf, could you please review this? You're the last step before this can be merged :)

I was waiting for my previous review to be addressed! Now unless there's an urgent need I'll do it after the Jam 🙂

mockersf avatar Jul 21 '24 10:07 mockersf