Provide ability to restore from cache but NOT save cache
Hello 👋
I am hoping to discuss whether you'd consider adding inputs that would allow a caller to indicate that they'd like to restore bundler cache but NOT save it at the end of the workflow.
Rationale: In some scenarios, it does not make sense to save the cache. One such scenario is when PRs are running in the merge queue. The branch associated with the PRs being checked looks like gh-readonly-queue/main/pr-15190-f5aefc512c3719594eb22d601d2824ad519d07a0. This is scoped to a PR and a commit SHA. Due to the way cache restoration via actions/cache works, this cache will only be used by subsequent runs on the same PR and the same sha only in the queue (which is extremely unlikely, if not impossible--I'm not sure if that SHA is the result of several PRs being combined or what). Anyway, I'd love to avoid adding a bunch of these useless cached objects to the cache and increasing the likelihood of cache thrashing, if at all possible.
Request: I am wondering if it would be as simple as adding a parameter save-cache: true|false to the action:
- name: Setup Ruby and Bundler
if: ${{ inputs.install_ruby == 'true'}}
uses: ruby/setup-ruby@44511735964dcb71245e7e55f72539531f7bc0eb # v1.257.0
with:
ruby-version: ${{ steps.grab-versions.outputs.ruby_version }}
bundler-cache: true
working-directory: ${{ inputs.bundler_working_directory }}
# Don't cache in the merge queue, it won't be used!
save-cache: ${{ github.event_name != "merge_group" }}
For ultimate clarity, parameters restore-bundler-cache and save-bundler-cache would be better.
This is akin to the approach you can take when using actions/cache/restore and actions/cache/save directly. We do this in other places within our workflows.
Here's an example screenshot of a cache item in our cache that is using 100MB of space that'll never be used. Depending on the throughput of your repo, this may be significant if you have many PRs:
After doing some research, it may be worthwhile for me to look into whether the queue is even getting cache hits. If they're not (which it sound like maybe they're not), it may make sense to just disable caching altogether in the queue.
OK, I just looked at one of our jobs and DO see cache hits of the Flutter SDK, so I think this issue is still relevant.
Cache restored successfully
Cache restored from key: flutter-linux-stable-3.29.2-x64-c23637390482d4cf9598c3ce3f2be31aa7332daf
Run actions/cache@v4
with:
path: /home/runner/.pub-cache
key: flutter-pub-linux-stable-3.29.2-x64-c23637390482d4cf9598c3ce3f2be31aa7332daf-bfc7f4aacbcf80ddd0adb01ff1c07d0b79b2781df10ca0f5d21e39ce6f77e63c
enableCrossOsArchive: false
fail-on-cache-miss: false
lookup-only: false
save-always: false
Cache hit for: flutter-pub-linux-stable-3.29.2-x64-c23637390482d4cf9598c3ce3f2be31aa7332daf-bfc7f4aacbcf80ddd0adb01ff1c07d0b79b2781df10ca0f5d21e39ce6f77e63c
Received 33554432 of 136985095 (24.5%), 32.0 MBs/sec
Received 136985095 of 136985095 (100.0%), 69.0 MBs/sec
Cache Size: ~131 MB (136985095 B)
/usr/bin/tar -xf /home/runner/work/_temp/5154d90d-ef9c-48e7-b5d1-663f0eb789a2/cache.tzst -P -C /home/runner/work/mobile/mobile --use-compress-program unzstd
Cache restored successfully
Cache restored from key: flutter-pub-linux-stable-3.29.2-x64-c23637390482d4cf9598c3ce3f2be31aa7332daf-bfc7f4aacbcf80ddd0adb01ff1c07d0b79b2781df10ca0f5d21e39ce6f77e63c
GitHub Action cache by design is per branch, and cache enabled for any PR branch is also by design, so that if a user push multiple times to the same PR brach it will be able to use the cache. Space used for cache usually isn't a concern because GitHub will automatically purge older cache.
If you really don't like this behavior, you can just do bundler-cache: false and manually run bundle install, then you have full control yourself with actions/cache.
GitHub Action cache by design is per branch, and cache enabled for any PR branch is also by design, so that if a user push multiple times to the same PR brach it will be able to use the cache. Space used for cache usually isn't a concern because GitHub will automatically purge older cache.
If you really don't like this behavior, you can just do
bundler-cache: falseand manually runbundle install, then you have full control yourself withactions/cache.
I don't think this really touches on the point I'm making. I understand how the caching works, and I understand why it's beneficial. The problem is that merge-queue branches are single-use, and therefore it does not make sense to persist to the cache from them.
Space used for cache usually isn't a concern because GitHub will automatically purge older cache.
This is exactly the problem--if your cache is being bloated by merge-queue runs (ours is), cached items will be purged sooner than necessary. There are times where our cache bloats to 20GB (I've seen it as high as 60!) due to the throughput on our repo. In other cases where we have control over the caching strategy (i.e. where we're caching ourselves via actions/cache/save, we do not save cache in the queue, and this has been helpful in alleviating cache bloat).
In that case the best way to do it would be detect if a run is from merge queue, which can be don by check github.event_name, and then disable writing cache. PR would be welcome.
Also for now maybe you can do something like this as a workaround:
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.4'
bundler-cache: ${{ github.event_name != 'merge_group' }}
- if: github.event_name == 'merge_group'
run: bundle install
I wonder if it might make sense to file a similar issue to https://github.com/actions/toolkit/, i.e. if it never makes sense to save cache for merge_group, then maybe cache.saveCache() should noop in that case.