action-docker-layer-caching
action-docker-layer-caching copied to clipboard
Can we improve slow download time?
Hello 👋
Firstly, thank you for work on this 💙 We're using it all over the place at Exercism and it's proving to be a brilliant tool.
One thing I'm noticing though is that the more its used the slower it gets to download things. On a repo I'm working on atm, it takes over 5mins to download the data and load it into docker. This time seems to be linearly increasing with each usage, which scares me a little! I've tried experimenting with different concurrency
levels but to no avail.
I'm wondering if you know of any way to improve things, either for me as a user, or any ideas about how we could speed up/improve the action itself?
Could we maybe set expiries on the cached data, removing layers that haven't been used in a while? This could happen either in the clean up phase of the action, or as a stand-alone clean-up action that could run daily?
There's a few of us at Exercism that would happily contribute to making things better if you want us to submit a PR, etc, but I'm wondering if you had any ideas/thoughts/direction regarding how we could improve this?
Thank you! Jeremy
👍 for this issue, here is my 0s build time with few minutes of caching operation
I think it's related to https://github.com/actions/cache/issues/381 -- looks like the current version of the actions/cache that's being used in this project is @1
GitHub's naming is confusing... The action actions/cache@2
uses NPM package @actions/[email protected]
, which lives here: https://github.com/actions/toolkit/tree/main/packages/cache.
This action is already using @actions/[email protected]
, with the faster Azure SDK segmented downloads.
For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run docker images -a
before and after the cache loads, and see if the size of images added from cache is getting out of control.
For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run
docker images -a
before and after the cache loads, and see if the size of images added from cache is getting out of control.
We are seeing this happen and our build times are going 📈. Any recommendations on how to fix this?
Likewise, here, we are experiencing slow downloads and/or uploads to cache
For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run
docker images -a
before and after the cache loads, and see if the size of images added from cache is getting out of control.We are seeing this happen and our build times are going chart_with_upwards_trend. Any recommendations on how to fix this?
If you're not already using v0.0.9 or later, upgrading should help some.
Besides that, currently I think the only workaround is to change your cache keys periodically. That will empty the cache, discarding any images that are no longer used.
The slowdown is happening because all the restored images from the cache have to be carried over into the next cache. That's needed to guarantee that any cached images used by docker are still present for the next run to use. Unfortunately it means that unused images are carried over too.
This could be avoided if docker had a way to monitor cache hits, but it doesn't appear to.
It might help to add some new options to the action for discarding cached images. For example, users could specify how many tags to retain, and the action would keep the newest ones up to that limit. It may also be possible to infer which restored images were not used and discard them, but that's difficult for multistage builds.
Besides that, currently I think the only workaround is to change your cache keys periodically. That will empty the cache, discarding any images that are no longer used.
For anyone looking at a way to do this automatically, we're using the month number as a rotating cache key variable, like so:
- run: echo "MONTH=$(date +%m)" >> $GITHUB_ENV
- uses: satackey/[email protected]
# Ignore the failure of a step and avoid terminating the job.
continue-on-error: true
with:
key: ${{ github.workflow }}-${{ env.MONTH }}-{hash}
restore-keys: |
${{ github.workflow }}-${{ env.MONTH }}-
For more active projects, you could use a weekly cache key (date +%U
). I haven't found a better way yet, but definitely open to suggestions
My approach to the problem:
- pull dependent images before cache action
- Build hash from major changers of the docker image
YARN=$(md5sum yarn.lock | awk '{ print $1 }')
PKG=$(md5sum package.json | awk '{ print $1 }')
API_PKG=$(md5sum apps/api/package.json | awk '{ print $1 }')
TYPES_PKG=$(md5sum packages/types/package.json | awk '{ print $1 }')
CLIENT_PKG=$(md5sum packages/client/package.json | awk '{ print $1 }')
echo "YARN_HASH=${YARN}_${PKG}_${API_PKG}_${TYPES_PKG}_${CLIENT_PKG}" >> $GITHUB_ENV
- Use hash in key for cache
- Prune images before cache upload
- run: |
docker image prune -a --force --filter "label=tag!=${{ github.sha }}"
🤞
How is anybody even using this action if the cache continually grows with each build?
It seems this action will always make build times worse after the first handful of builds... Am I missing something?
@adambiggs you should have a build step that cleans up images -- at least that's what we do. We prune images older than three days so we can still leverage the cache without having it be astronomical in size.
Thanks @omacranger. For anyone who might find themselves here, the workaround I ended up with is adding this step at the end of my job:
- run: docker image prune --all --force --filter "until=48h"
I think a note should really be added to the readme, because some flavour of this workaround seems to be a hard requirement for using this action.
i've found it much quicker to download the most recently(/similar) built image and use --cache-from
. i'm not sure if there are other cases where this layer caching solution is cheaper.