Alternative storage backend
The current storage for cache is Github, this does not work well with self hosted runner or some security sensitive case. Two issues impose here, security, and for self hosted runners there’s always options to store it much faster (locally, blob storage), rather than uploading/downloading gigabytes of files from Github.
This would help our self-hosted runners that are on prem a lot. Maybe we can create a plugin kind of system for different type of storage systems.
In my opinion, this certainly makes sense. It also has the advantage that you don't have to worry about GitHub's cache-size limits. The difficulty of doing this is not so high, so I can work on this if necessary. What do you think? @dhadka Well, I understand that there are more higher priority issues at the moment, so I would like to complete them first.
@zen0wu There is a workaround made by @shonansurvivors that currently only work with S3 and its compatible services and systems. https://github.com/shonansurvivors/actions-s3-cache
But I don't think that action will make things much faster than this action. It has an advantage over this action, probably only to avoid cache-size limits.
@smorimoto Yeah, I was thinking of working on a generic "bring your own storage" Action during an upcoming company hackathon. My idea was to have plugins for the various storage providers (Azure, S3, Minio, etc.) with some parameters to let you control the lifetime of the file. For example, if we had a time-to-live (TTL) parameter, we could essentially implement caching with a short TTL and the upload-artifact and download-artifact actions with a longer TTL. A scheduled job could then be setup to run daily or weekly to scan for and remove old / unused files. This would give users much more control over the content and also eliminate many of the current restrictions (size limits, sharing between branches and repos, etc.)
@dhadka Oh! I think that's a pretty good idea. Everyone probably thought about bringing the concept of the plugin to Actions, but no one has done that yet. It will have a good effect on the community that these famous Actions do it.
A similar concern was brought up before: https://github.com/actions/cache/issues/279
Would love to see either option done (local or a general repository-style API for setting your own cache backends) - storing cache on GitHub is convenient but the limitations can be a deal-breaker if you're trying to use the cache as a way to introduce persistent directories between action runs, particularly when there might be 7 days between cache pulls.
If we could use self-managed disks, that would be awesome! For our developers, we want the CI CD experience to be the same both in the cloud and self-hosted. Right now, caching would need our own solution on self-hosted.
I would love the ability to use blob storage. I have a Rust repository and each matrix build has > 1GB of compiled files, so we can't complete a single build without causing cache eviction.
Would love to see this too, video game development with Unity creates huge Library folders, even for small scale games (ours is >3Gb), and each platform needs a separate cache. I'll have to take a look at https://github.com/shonansurvivors/actions-s3-cache, because we basically can't use Github caching for now (I would like to cache LFS files as well to decrease bandwith costs).
I ended up doing something really simple with a custom composite actions in our repo. In case others find it useful. It does not support restore-keys yet, but it shouldn't be too hard to add.
https://gist.github.com/zen0wu/a0a7cd95fe3f2f550467c4428ef0f87c
I ended up with this: https://github.com/marketplace/actions/s3-cache
Note that GH is working on increased storage sizes: https://github.com/github/roadmap/issues/66
Although this issue is planned for 'Future' so there's no ETA for when it'll be completed.
@judge2020 https://github.com/actions/cache/discussions/497
I don't think increasing cache storage size will be enough because you will always find someone who the defaults don't work for.
As a case study, I am working on a Rust project which also compiles TensorFlow from source. In a single run, our Linux build generated a 4417MB cache while the MacOS build generated 2115MB. If the cache is limited to 10GB, only one PR can use the cache at a time and all others will be evicted. CI runs only take 10 minutes with a cache hit, but having to rebuild from scratch blows build times out to 50+ minutes.
Multiply by several developers actively working on the project at the same time all evicting each others caches and you have a severe developer experience problem.
It would be much more convenient if my company could use our own S3 bucket for caching because then we can set the limit as high as we want and the only impact is a slightly larger AWS bill at the end of the month.
Is there any plan to make this official? It would be very beneficial for the community
@Sytten I don't think anything is being planned in this direction for GitHub.com right now as maintaining regions manually so that cache can work effectively will be very difficult in such cases.
Although, we do have option for custom storage in GHES (Github enterprise server), the possibility of us supporting this in Github.com is very low.
@Sytten I don't think anything is being planned in this direction for GitHub.com right now as maintaining regions manually so that cache can work effectively will be very difficult in such cases.
Although, we do have option for custom storage in GHES (Github enterprise server), the possibility of us supporting this in Github.com is very low.
Why not just allow specifying an Azure Blob Storage or similar on an org / enterprise level?
@meiswjn We do use azure blob storage to save actions caches. How will allowing a custom Azure blob storage help you additionally? Can you please elaborate?
@meiswjn We do use azure blob storage to save actions caches. How will allowing a custom Azure blob storage help you additionally? Can you please elaborate?
Sure. For legal reasons we are not allowed to use GitHub Runners and GitHub storage. Export control, data privacy, internal security guidelines - you name it. Allowing a custom blob storage would give us many possibilities: Host it in our own country (no more export control and data privacy issues), connect it via private endpoint (confidential data / internal processes), use our own level of encryption, manage billing by ourself, etc.
Hi, my employer uses self hosted runners on AWS (for security reasons). We'd like to do actions caching for Rust language intermediary build artifacts. It would be nice to not have to look at maintaining an internal fork of this action in order to do so.
Just reviving this thread a little. We have been using https://github.com/tespkg/actions-cache for a while, but it really would not be that big a change to port it to this action. I am willing to do the work if someone from the github team can confirm this is something they are interested in maintaining. I feel the arguments made here are pretty convincing, but I don't want to work for nothing.
@vsvipul Any change in the policy of github around that topic?
Is there any update on this? We're using self-hosted runners and waiting to restore/save cache back to GitHub is a crime. We need local disk storage
@gaspo53 check out a fork of the above action at https://github.com/everpcpc/actions-cache which implements Apache OpenDAL as a storage backend, allowing use of S3 and comparable Azure/GCP services for cache
You could also have a look at https://github.com/runs-on/cache, which is a drop-in replace for actions/cache@v4, but supports S3 as a backend. Especially useful if your runners are in AWS, since you can get unlimited cache and at least 300MiB/s throughput.