cache icon indicating copy to clipboard operation
cache copied to clipboard

Add option to specify a custom cache server

Open sindrig opened this issue 3 years ago • 7 comments

Our organization uses custom github runners for most of our GHA workflows. These runners are running in EKS on AWS so we want to be able to leverage AWS S3 for cache both to limit network ingress/egress and to raise the 5GB limit.

We also want changing between our own cache and GHA's cache to be seamless and hidden from our developers. After looking throught the source code for actions/cache and actions/toolkit I figured something like this had the least impact.

I started by trying to override ACTIONS_CACHE_URL in the workflow but javascript actions seem to override ACTIONS_* environment variables.

sindrig avatar Nov 16 '21 12:11 sindrig

Just my 2cts I think that most of your changes should be done in https://github.com/actions/toolkit/tree/main/packages/cache After package @actions/cache been published with those modifications, made only call changes here

rockandska avatar Nov 25 '21 12:11 rockandska

I can create a PR for actions/toolkit to take something like GITHUB_ACTIONS_CACHE_URL into account when determining the cache url in https://github.com/actions/toolkit/blob/main/packages/cache/src/internal/cacheHttpClient.ts#L33. Is that what you're suggesting rather than this?

I can see benefits of both. I'll create the PR in actions/toolkit as well and we can compare them.

EDIT: See https://github.com/actions/toolkit/pull/947

sindrig avatar Nov 25 '21 12:11 sindrig

I can see benefits of both. I'll create the PR in actions/toolkit as well and we can compare them.

I think that all the logic should be done in toolkit/cache and the update in actions/cache should only use additional parameters available in toolkit/cache but not both

Anyway, that's just my opinion and you'll see what maintainers think of it :)

rockandska avatar Nov 25 '21 12:11 rockandska

  • @bishal-pdMSFT

kmkumaran avatar Jan 17 '22 04:01 kmkumaran

@sindrig we recently updated the cache limit to 10GB per repo. Is that going to help you?

Using a custom cache server is something which I feel needs more discussion. There are two problems I can think of for now:

  1. We do plan to build cache management experience which would expose metadata about cache in a repo. If there are multiple cache servers, such management would become much more complicated
  2. From security point of view, how is the custom cache server endpoint secured? And how can runner trust a secured endpoint.

Also, it may need a better design on how to manage the custom server URL. It looks like it should be a repo/org level setting and not a workflow setting. Today, GitHub enterprise server has similar setting for specifying cache's storage provider. That seems like a better design for custom server as well. Unfortunately, creating such a feature is not a priority for our team currently and to be honest we haven't heard much request from other customers.

I would suggest you to create an issue for this and lets go through few ideas there. We may get good comments from community as well.

bishal-pdMSFT avatar Jan 17 '22 04:01 bishal-pdMSFT

@bishal-pdMSFT Thanks for getting back to me. Unfortunately I no longer work for island.is but I'll try to answer your questions to the best of my knowledge. @dabbeg or @pshomov will probably want to take this from now on.

The 10GB limit would help, but probably not enough. island.is has >50 developers at any given time, of whom maybe 10-20 commit code each day. This means that only developer branches are in the tens each day, and on top of that there are release branches (current and next) and main to think of. Each branch has >2gb of node_modules and docker caches, so as soon as branch number 5 gets pushed, the oldest of them gets it's cache invalidated. This was happening so often for us that branches that submitted their cache in a prepare step already had the cache invalidated in subsequent steps, as soon as 5 minutes after the cache was committed.

IIRC the custom cache server endpoints are ip restricted to our custom runner ips. I seem to remember some jwt being passed from GHA, but I don't recall if we do any validation on it.

Providing your own cache storage provider (e.g. AWS S3 bucket/azure blob/google storage bucket) would be best TBH and if you're going to put any time or effort into generalizing this problem my vote would be on that.

For the time being, approving this (or https://github.com/actions/toolkit/pull/947 as a less intrusive option?) would give people with the same problems as island.is were having an ability to continue using actions/cache, as the alternative for them would probably be re-writing the action.

sindrig avatar Jan 17 '22 14:01 sindrig

@bishal-pdMSFT another reason is to reduce the bandwidth and egress/ingress costs when using custom runners. Since there are many different s3 cache actions already, I think your customers are interested in this.

I tried https://github.com/whywaita/actions-cache-s3/pull/1 by @whywaita which works nicely, and it only changes the action code, not the toolkit.

When running in EKS you can use the pod identity / IRSA to secure access to the bucket, with TLS to authenticate the bucket. The bucket/folder must not be writable for another/untrusted pipeline to avoid cache poisoning. I don't think most users would expect a cache management tool to work when you opt out of the built in caching.

chlunde avatar Mar 07 '22 20:03 chlunde

Hi 👋🏽 We are unable to take this request at this time. As @bishal-pdMSFT captured above, there are a few concerns that is stopping us from taking this change. Please open an issue so that we can hear from the community regarding the need for such a feature. Closing the PR for now.

aparna-ravindra avatar Oct 07 '22 08:10 aparna-ravindra

+1 for this feature

bmarick avatar Dec 20 '22 17:12 bmarick

idk why this feature is not being pushed for

instead of moving cache through the internet, an option to have it internally will save you storage and save us bandwidth

halradaideh avatar Mar 03 '24 19:03 halradaideh

+1 this

@sindrig we recently updated the cache limit to 10GB per repo. Is that going to help you?

Using a custom cache server is something which I feel needs more discussion. There are two problems I can think of for now:

  1. We do plan to build cache management experience which would expose metadata about cache in a repo. If there are multiple cache servers, such management would become much more complicated
  2. From security point of view, how is the custom cache server endpoint secured? And how can runner trust a secured endpoint.

Also, it may need a better design on how to manage the custom server URL. It looks like it should be a repo/org level setting and not a workflow setting. Today, GitHub enterprise server has similar setting for specifying cache's storage provider. That seems like a better design for custom server as well. Unfortunately, creating such a feature is not a priority for our team currently and to be honest we haven't heard much request from other customers.

I would suggest you to create an issue for this and lets go through few ideas there. We may get good comments from community as well.

@bishal-pdMSFT this feature really useful for us as well. 10GB per repo is still not enough. By the way I did not understand why you should worry about the security where teams can use their private cloud storage(Blob storage, S3 and etc).

semihural-tomtom avatar Apr 11 '24 14:04 semihural-tomtom