Github Actions Cache proxy
Hey,
first of all: thank you for this great project!
I came across this repository while I was looking for a Bazel 'native' caching proxy to use GitHub Actions Cache as a Bazel remote cache.
I only found a GitHub Action that is setting up some proxy but bazel-remote is far more advanced than anything else I could find.
Unfortunately, you don't have (yet) GitHub Actions Cache as a proxy backend, so I implemented a PoC based on the new V2 API and I was curious whether you'd be interested in a contribution of a (proper) implementation.
The V2 API for the GitHub Actions Cache is based on twirp-rpc and I reverse engineered the protobuf spec (with a little help of Claude) from the generated TypeScript client in the official GitHub Actions SDK.
If you would be interested in this proxy implementation, there are some follow up questions, due to the fact that the API is protobuf based.
I'm fairly new to Bazel so I appreciated the opportunity to get to know the protobuf integration of Bazel but of course that breaks the go install setup way.
But, if you want to keep that in tact, I'd have to update my code to generate the client code in advance and disable the protobuf integration.
I'm absolutely fine to do so, i just wanted to get your opinion before going in this or that direction.
Thanks in advance, Cheers
Peter
Hi, could you explain the use case a bit? IIUC github action caches store directories. If that's true you could potentially use that as a regular "disk" cache with bazel-remote, without needing any modifications.
You probably could, but it comes with some overhead:
when you cache a directory, the cache action is creating a tar file and is uploading this (with a streaming compression) to an Azure Blob storage. When restoring a cached directory, it is downloading this blob again from the Azure Storage Account and deflates it as the last step.
With a direct integration to the Actions Cache API, you're saving the whole tar file creation .... and you can (theoretically) already use cached items in parallel running workflows.
In my PoC, I basically copied and adapted the existing Azure Storage Account proxy integration and replaced the client with the generated twirp client for the Actions Cache API for uploading blobs similar to what is already done for other cloud storages.
For the download it is basically:
- check if there's matching cache entry and retrieve its content URL
- download the blob with the Azure Storage blob client (for improved performance)
- profit 😄
If you think, the benefit isn't worth the extra maintenance, I'm also fine closing this
Feel free to open a PR so I can take a look. I am not sure if it will be accepted, but it sounds interesting at least :)