actions-runner-controller
actions-runner-controller copied to clipboard
Ephemeral runner dependency caching
What would you like added?
I would like to be able to use something like the actions/cache
action in order to cache dependencies installed with e.g. npm install
(node_modules) in my workflow. I want to do this using ephemeral runners and i want to store the cache on a persistent volume or similar, not github (as it would be slow)
Why is this needed?
Currently, doing things like actions/setup-node
and npm install
is doing a lot of unnecessary requests and ends up taking lots of time. I would like to cache these things just as i do on Github-hosted runners with the actions/cache
action so that the time spent is reduced.
I see that we can mount volumes to ephemeral runners today, but i'm not sure how or if its even possible to get ARC to write to/load from the cache.
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Any news on this? I would also really benefit from something like this but I use maven.
I've solved similar problem for node modules with help of https://verdaccio.org/ Put it as a DaemonSet to your worker nodes where Runners are executed, and point your job to pull dependencies via it. This setup worked pretty good.
I've solved similar problem for node modules with help of https://verdaccio.org/ Put it as a DaemonSet to your worker nodes where Runners are executed, and point your job to pull dependencies via it. This setup worked pretty good.
Yes, this helps for some cases. I’ve accomplished the same with Artifactory but it’s unnecessarily complex and you still have to deal with dependency resolution which in itself can take > minute. Also you are limited on what you can cache, unlike the official cache action.
I am interested in a solution similar to the official cache action (preferably with the same API) so we can cache anything (just as we can on GitHub), without additional 3rd party software.
I solved this by using a ReadWriteMany
PVC that every ephemeral runner attaches to upon startup:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: enterprise-runnerset-large
spec:
replicas: 4
image: $IMAGE
dockerdWithinRunnerContainer: true
enterprise: $ENTERPRISE
labels:
- ubuntu-latest
selector:
matchLabels:
app: runnerset-large
serviceName: runnerset-large
template:
metadata:
labels:
app: runnerset-large
spec:
securityContext:
fsGroup: 1001
fsGroupChangePolicy: "Always"
terminationGracePeriodSeconds: 110
containers:
- name: runner
env:
- name: RUNNER_GRACEFUL_STOP_TIMEOUT
value: "90"
- name: ARC_DOCKER_MTU_PROPAGATION
value: "true"
resources:
limits:
memory: "8Gi"
requests:
cpu: "2"
memory: "8Gi"
volumeMounts:
- mountPath: /opt/hostedtoolcache
name: tool-cache
- mountPath: /runner/_work
name: work
volumes:
- name: tool-cache
persistentVolumeClaim:
claimName: tool-cache-enterprise-runnerset-large-0
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "csi-ceph-cephfs"
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: tool-cache-enterprise-runnerset-large-0
finalizers:
- kubernetes.io/pvc-protection
labels:
app: runnerset-large
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 25Gi
storageClassName: csi-ceph-cephfs
volumeMode: Filesystem
View of the cache:
runner@enterprise-runnerset-medium-zj5wn-0:/$ ls -la /opt/hostedtoolcache/
total 1
drwxrwsr-x 22 root runner 22 Sep 19 17:38 .
drwxr-xr-x 1 root root 24 Sep 25 17:34 ..
drwxrwsr-x 3 runner runner 1 Aug 4 17:11 Java_Adopt_jdk
drwxrwsr-x 4 runner runner 2 Jul 25 15:49 Java_Corretto_jdk
drwxrwsr-x 3 runner runner 1 Aug 4 17:11 Java_IBM_Semeru_jdk
drwxrwsr-x 3 runner runner 1 Jul 14 15:45 Java_Oracle_jdk
drwxrwsr-x 5 runner runner 3 Aug 18 16:46 Java_Temurin-Hotspot_jdk
drwxrwsr-x 3 runner runner 1 Aug 4 17:13 Java_Zulu_jdk
drwxrwsr-x 3 runner runner 1 Jul 27 15:18 Miniconda3
drwxrwsr-x 5 runner runner 3 Sep 19 17:39 PyPy
drwxrwsr-x 9 runner runner 7 Sep 19 17:38 Python
drwxrwsr-x 6 runner runner 4 Sep 21 14:22 Ruby
drwxrwsr-x 3 runner runner 1 Jul 6 21:45 blobs
drwxrwsr-x 5 runner runner 3 Jul 26 20:49 buildx
drwxrwsr-x 4 runner runner 2 Jul 20 13:21 buildx-dl-bin
drwxrwsr-x 8 runner runner 9 Aug 4 20:46 dotnet
drwxrwsr-x 5 runner runner 3 Aug 30 19:33 go
drwxrwsr-x 3 runner runner 1 Jul 12 19:06 grype
-rw-rw-r-- 1 runner runner 244 Jul 6 21:46 index.json
drwxrwsr-x 2 runner runner 0 Jul 6 21:46 ingest
drwxrwsr-x 4 runner runner 2 Jul 18 21:21 maven
drwxrwsr-x 9 runner runner 7 Aug 17 21:22 node
-rw-rw-r-- 1 runner runner 30 Jul 6 21:46 oci-layout
drwxrwsr-x 3 runner runner 1 Jul 12 19:05 syft
runner@enterprise-runnerset-medium-zj5wn-0:/$ ls -la /opt/hostedtoolcache/node/
total 0
drwxrwsr-x 9 runner runner 7 Aug 17 21:22 .
drwxrwsr-x 22 root runner 22 Sep 19 17:38 ..
drwxrwsr-x 3 runner runner 2 Jul 25 22:12 14.18.2
drwxrwsr-x 3 runner runner 2 Jul 20 18:53 16.14.0
drwxrwsr-x 3 runner runner 2 Jul 26 16:58 16.20.0
drwxrwsr-x 3 runner runner 2 Jul 3 11:00 16.20.1
drwxrwsr-x 3 runner runner 2 Jul 14 15:45 18.16.0
drwxrwsr-x 3 runner runner 2 Jul 6 14:38 18.16.1
drwxrwsr-x 3 runner runner 2 Aug 17 21:22 6.17.1
I solved this by using a
ReadWriteMany
PVC that every ephemeral runner attaches to upon startup:apiVersion: actions.summerwind.dev/v1alpha1 kind: RunnerSet metadata: name: enterprise-runnerset-large spec: replicas: 4 image: $IMAGE dockerdWithinRunnerContainer: true enterprise: $ENTERPRISE labels: - ubuntu-latest selector: matchLabels: app: runnerset-large serviceName: runnerset-large template: metadata: labels: app: runnerset-large spec: securityContext: fsGroup: 1001 fsGroupChangePolicy: "Always" terminationGracePeriodSeconds: 110 containers: - name: runner env: - name: RUNNER_GRACEFUL_STOP_TIMEOUT value: "90" - name: ARC_DOCKER_MTU_PROPAGATION value: "true" resources: limits: memory: "8Gi" requests: cpu: "2" memory: "8Gi" volumeMounts: - mountPath: /opt/hostedtoolcache name: tool-cache - mountPath: /runner/_work name: work volumes: - name: tool-cache persistentVolumeClaim: claimName: tool-cache-enterprise-runnerset-large-0 - name: work ephemeral: volumeClaimTemplate: spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "csi-ceph-cephfs" resources: requests: storage: 5Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: tool-cache-enterprise-runnerset-large-0 finalizers: - kubernetes.io/pvc-protection labels: app: runnerset-large spec: accessModes: - ReadWriteMany resources: requests: storage: 25Gi storageClassName: csi-ceph-cephfs volumeMode: Filesystem
View of the cache:
runner@enterprise-runnerset-medium-zj5wn-0:/$ ls -la /opt/hostedtoolcache/ total 1 drwxrwsr-x 22 root runner 22 Sep 19 17:38 . drwxr-xr-x 1 root root 24 Sep 25 17:34 .. drwxrwsr-x 3 runner runner 1 Aug 4 17:11 Java_Adopt_jdk drwxrwsr-x 4 runner runner 2 Jul 25 15:49 Java_Corretto_jdk drwxrwsr-x 3 runner runner 1 Aug 4 17:11 Java_IBM_Semeru_jdk drwxrwsr-x 3 runner runner 1 Jul 14 15:45 Java_Oracle_jdk drwxrwsr-x 5 runner runner 3 Aug 18 16:46 Java_Temurin-Hotspot_jdk drwxrwsr-x 3 runner runner 1 Aug 4 17:13 Java_Zulu_jdk drwxrwsr-x 3 runner runner 1 Jul 27 15:18 Miniconda3 drwxrwsr-x 5 runner runner 3 Sep 19 17:39 PyPy drwxrwsr-x 9 runner runner 7 Sep 19 17:38 Python drwxrwsr-x 6 runner runner 4 Sep 21 14:22 Ruby drwxrwsr-x 3 runner runner 1 Jul 6 21:45 blobs drwxrwsr-x 5 runner runner 3 Jul 26 20:49 buildx drwxrwsr-x 4 runner runner 2 Jul 20 13:21 buildx-dl-bin drwxrwsr-x 8 runner runner 9 Aug 4 20:46 dotnet drwxrwsr-x 5 runner runner 3 Aug 30 19:33 go drwxrwsr-x 3 runner runner 1 Jul 12 19:06 grype -rw-rw-r-- 1 runner runner 244 Jul 6 21:46 index.json drwxrwsr-x 2 runner runner 0 Jul 6 21:46 ingest drwxrwsr-x 4 runner runner 2 Jul 18 21:21 maven drwxrwsr-x 9 runner runner 7 Aug 17 21:22 node -rw-rw-r-- 1 runner runner 30 Jul 6 21:46 oci-layout drwxrwsr-x 3 runner runner 1 Jul 12 19:05 syft runner@enterprise-runnerset-medium-zj5wn-0:/$ ls -la /opt/hostedtoolcache/node/ total 0 drwxrwsr-x 9 runner runner 7 Aug 17 21:22 . drwxrwsr-x 22 root runner 22 Sep 19 17:38 .. drwxrwsr-x 3 runner runner 2 Jul 25 22:12 14.18.2 drwxrwsr-x 3 runner runner 2 Jul 20 18:53 16.14.0 drwxrwsr-x 3 runner runner 2 Jul 26 16:58 16.20.0 drwxrwsr-x 3 runner runner 2 Jul 3 11:00 16.20.1 drwxrwsr-x 3 runner runner 2 Jul 14 15:45 18.16.0 drwxrwsr-x 3 runner runner 2 Jul 6 14:38 18.16.1 drwxrwsr-x 3 runner runner 2 Aug 17 21:22 6.17.1
Hi @alec-drw - I set it up exactly like this, but getting the below error under Github actions pipeline:
Download action repository 'actions/checkout@v3' (SHA:f43a0e5ff2bd294095638e18286ca9a3d1956744)
Error: Can't use 'tar -xzf' extract archive file: /runner/_work/_actions/_temp_cbd2a7be-cad1-4030-b170-e4737cdf2323/ca9fffe4-3f99-409f-a500-81e17f49794c.tar.gz. Action being checked out: actions/checkout@v3. return code: 2.
Did you face anything like this or any idea?
I ended up with something like this
This is brutal usage of hostPath
but it should provide a fast cache.
You will need a script to populate and manage the cache directory
spec:
securityContext:
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
fsGroupChangePolicy: "OnRootMismatch"
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
volumeMounts:
- name: hostedtoolcache
mountPath: /opt/hostedtoolcache
volumes:
- name: hostedtoolcache
hostPath:
# directory location on host
path: /tmp/arc
type: DirectoryOrCreate
So I guess that we need some sort of actions/ARC-cache
helper to store and restore the files from local volumes
Maybe the actions/cache
can be modified to just use local storage instead of cloud storage.
There is this issue on actions/cache asking for alternative backends https://github.com/actions/cache/issues/354
@adiroiban how would that work with multiple nodes in the cluster? Each time a runner comes online it would have a different cache - my first pass used this but having a persistent cache via a PV with RWX allows for the same dir to be mounted each time
If you already have a PV with RWX, then this is not needed.
My workaround is only for simple hostPath storage.
And is only to provide the super-fast persistent storage on the node.
And this is only for testing.
You can add more complex logic for storing / restoring the cache... but this will end up implemeting actions/cache
At the start and end of a workflow I have something like this
I am only caching "build" and "node_modules"
- name: Restore cache
shell: bash
run: |
if [ -f /opt/hostedtoolcache/pending ]; then
echo "Waiting for pending cache to finalize..."
sleep 10
fi
if [ -d /opt/hostedtoolcache/build ]; then
echo "Restoring cache."
cp -r /opt/hostedtoolcache/build build
cp -r /opt/hostedtoolcache/node_modules node_modules
else
echo "No cache found."
fi
- name: Store cache
shell: bash
run: |
if [ ! -f /opt/hostedtoolcache/pending ]; then
touch /opt/hostedtoolcache/pending
echo "Saving cache..."
rm -rf /opt/hostedtoolcache/build
rm -rf /opt/hostedtoolcache/node_modules
mv build /opt/hostedtoolcache/
mv node_modules /opt/hostedtoolcache/
rm /opt/hostedtoolcache/pending
else
echo "Not saving cache as there is a pending save."
fi
Yes, this helps for some cases. I’ve accomplished the same with Artifactory but it’s unnecessarily complex and you still have to deal with dependency resolution which in itself can take > minute. Also you are limited on what you can cache, unlike the official cache action.
Can you please share your details on how you achieved this with Artifactory? I am trying to perform setupnode and npm install in a China GitHub enterprise server. The difficulty is in the resolution of the dependencies.
One very hacky workaround that works in some scenarios (like in a monorepo setup) is to use a custom builder image and have the dependencies added in there. Something like
COPY --chown=runner:docker ./package.json /tmp/package.json
COPY --chown=runner:docker ./package-lock.json /tmp/package-lock.json
RUN cd /tmp && npm install
In our setup, this adds about 20-30s to download the (now larger) image to a fresh node, but after its prewarmed its pretty fast.
Is a solution that replaces cloud storage of actions/cache
with local storage possible?
We would like to switch to self-hosted runners but they have a very low success rate at handling actions that uses cache.
Action:
- name: Build, tag, and push image to container registry
uses: docker/build-push-action@v4
with:
push: ${{ env.ACT != 'true' }}
provenance: false
tags: |
${{ env.DOCKER_USERNAME }}/${{ inputs.image_name}}:${{ inputs.app_version }}
${{ env.DOCKER_USERNAME }}/${{ inputs.image_name}}:${{ github.sha }}
platforms: ${{ inputs.platforms }}
context: ${{ inputs.docker_context || '.' }}
file: ${{ inputs.docker_file || './Dockerfile' }}
cache-from: type=gha
cache-to: type=gha,mode=max
Fails with
buildx failed with: error: failed to solve: Get "https://acghubeus1.actions.githubusercontent.com/8V4YnVoosB0k2Rieq40qvDIC8EkpoC2S2GIgTQFJs9ePMWvozj/_apis/artifactcache/cache?keys=buildkit-blob-1-sha256%3A2457c1c5bd028c46eab1f52756c9d700d6dc39a0f03443dd9fd2d739a38c1a89&version=693bb7016429d80366022f036f84856888c9f13e00145f5f6f4dce303a38d6f2": net/http: TLS handshake timeout
Reading the docs on https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
When using self-hosted runners, caches from workflow runs are stored on GitHub-owned cloud storage. A customer-owned storage solution is only available with GitHub Enterprise Server.
If we change the action like this the error dissapears
- name: Build, tag, and push image to container registry
uses: docker/build-push-action@v4
with:
push: ${{ env.ACT != 'true' }}
provenance: false
tags: |
${{ env.DOCKER_USERNAME }}/${{ inputs.image_name}}:${{ inputs.app_version }}
${{ env.DOCKER_USERNAME }}/${{ inputs.image_name}}:${{ github.sha }}
platforms: ${{ inputs.platforms }}
context: ${{ inputs.docker_context || '.' }}
file: ${{ inputs.docker_file || './Dockerfile' }}
Unfortunately most actions that we use are already using GHA build cache.
Implementing cache on self-hosted runners is not that easy.
You might have your own homelab/on-premise bare metal servers, or AWS / Azure / Google operated kubernetes clusters, each with a different storage solution.
I have an on-premise bare metal k8s cluster so I am using OpenEBS Local PV Hostpath for storage with a simple script to cache and restore
Implementing cache on self-hosted runners is not that easy.
You might have your own homelab/on-premise bare metal servers, or AWS / Azure / Google operated kubernetes clusters, each with a different storage solution.
I have an on-premise bare metal k8s cluster so I am using OpenEBS Local PV Hostpath for storage with a simple script to cache and restore
Care to share?
@davidwincent there is a actively maintained project for exactly that I came across quite recently; have not tried it myself, however https://github.com/falcondev-oss/github-actions-cache-server