How to do fine-grained caching: bulk APIs?
We're trying to turbo-charge our builds via fine-grained caching with the Pants build system. Pants recently gained experimental support for using the GitHub Actions Cache as a fine-grained "remote cache", to see the benefits discussed in https://dev.to/benjyw/better-cicd-caching-with-new-gen-build-systems-3aem, where we can reuse test runs and build artefacts from previous runs, while only downloading exactly what's required.
However, we find it doesn't work well in practice for us, even on a moderate sized repository, because doing fine-grained caching quickly hits rate limits (having to upload and/or download thousands of small "files" via individual requests). https://github.com/pantsbuild/pants/issues/20133
Are there any bulk APIs or other recommendations for how to best do the following:
- Check whether several cache entries exist
- Upload several new cache entries
- Download several cache entries
Alternatively, some other way to use the cache for many small requests.
This might benefit more than just Pants, e.g. https://github.com/mozilla/sccache also has a GHA cache backend, but hits some errors like this (https://github.com/mozilla/sccache/issues/1485).
(I asked this question of support (#2409822), and they told me to ask here instead, even though it's not directly related to the code in this repo.)
Thanks!
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.
Pants still sees users affected by this, e.g.: https://chat.pantsbuild.org/t/18821099/for-the-experimental-gha-remote-caching-new-in-2-20-https-ww#97b93a14-6ea3-4234-9ecb-35a68e1a70f2
Looking forward to see the improvements as I'm also planning to adopt the fine-grained GHA cache for pantsbuild, as we are continuously hitting the cache size limit (10 GB per repo) and LFS caching gets degraded too often and quickly, resulting in the additional costs on the data packs.
We are also interested in this for Bazel implementations.
I was hoping to use GHA cache as a backend for the new Go 1.24 GOCACHEPROG feature. Such implementation will face the same issues and limitations. Looks like having a fine-grained caching API would benefit all of us.
Why there is no interest from Github to actually make the platform better. There is definitely many build systems out there that could use this feature.
Interested in this for https://github.com/bazel-contrib/setup-bazel/issues/18
Can we get some attention here?