action-docker-layer-caching icon indicating copy to clipboard operation
action-docker-layer-caching copied to clipboard

Can we improve slow download time?

Open iHiD opened this issue 3 years ago • 12 comments

Hello 👋

Firstly, thank you for work on this 💙 We're using it all over the place at Exercism and it's proving to be a brilliant tool.

One thing I'm noticing though is that the more its used the slower it gets to download things. On a repo I'm working on atm, it takes over 5mins to download the data and load it into docker. This time seems to be linearly increasing with each usage, which scares me a little! I've tried experimenting with different concurrency levels but to no avail.

I'm wondering if you know of any way to improve things, either for me as a user, or any ideas about how we could speed up/improve the action itself?

Could we maybe set expiries on the cached data, removing layers that haven't been used in a while? This could happen either in the clean up phase of the action, or as a stand-alone clean-up action that could run daily?

There's a few of us at Exercism that would happily contribute to making things better if you want us to submit a PR, etc, but I'm wondering if you had any ideas/thoughts/direction regarding how we could improve this?

Thank you! Jeremy

iHiD avatar Oct 11 '20 23:10 iHiD

👍 for this issue, here is my 0s build time with few minutes of caching operation image

mayli avatar Nov 24 '20 05:11 mayli

I think it's related to https://github.com/actions/cache/issues/381 -- looks like the current version of the actions/cache that's being used in this project is @1

cynicaljoy avatar Dec 05 '20 04:12 cynicaljoy

GitHub's naming is confusing... The action actions/cache@2 uses NPM package @actions/[email protected], which lives here: https://github.com/actions/toolkit/tree/main/packages/cache.

This action is already using @actions/[email protected], with the faster Azure SDK segmented downloads.

For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run docker images -a before and after the cache loads, and see if the size of images added from cache is getting out of control.

rcowsill avatar Dec 07 '20 01:12 rcowsill

For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run docker images -a before and after the cache loads, and see if the size of images added from cache is getting out of control.

We are seeing this happen and our build times are going 📈. Any recommendations on how to fix this?

vpontis avatar Jan 01 '21 18:01 vpontis

Likewise, here, we are experiencing slow downloads and/or uploads to cache

mo-mughrabi avatar Jan 03 '21 12:01 mo-mughrabi

For the OP, I suggest checking how many images are getting loaded into docker from the cache. Run docker images -a before and after the cache loads, and see if the size of images added from cache is getting out of control.

We are seeing this happen and our build times are going chart_with_upwards_trend. Any recommendations on how to fix this?

If you're not already using v0.0.9 or later, upgrading should help some.

Besides that, currently I think the only workaround is to change your cache keys periodically. That will empty the cache, discarding any images that are no longer used.

The slowdown is happening because all the restored images from the cache have to be carried over into the next cache. That's needed to guarantee that any cached images used by docker are still present for the next run to use. Unfortunately it means that unused images are carried over too.

This could be avoided if docker had a way to monitor cache hits, but it doesn't appear to.

It might help to add some new options to the action for discarding cached images. For example, users could specify how many tags to retain, and the action would keep the newest ones up to that limit. It may also be possible to infer which restored images were not used and discard them, but that's difficult for multistage builds.

rcowsill avatar Jan 03 '21 18:01 rcowsill

Besides that, currently I think the only workaround is to change your cache keys periodically. That will empty the cache, discarding any images that are no longer used.

For anyone looking at a way to do this automatically, we're using the month number as a rotating cache key variable, like so:

    - run: echo "MONTH=$(date +%m)" >> $GITHUB_ENV

    - uses: satackey/[email protected]
      # Ignore the failure of a step and avoid terminating the job.
      continue-on-error: true
      with:
        key: ${{ github.workflow }}-${{ env.MONTH }}-{hash}
        restore-keys: |
          ${{ github.workflow }}-${{ env.MONTH }}-

For more active projects, you could use a weekly cache key (date +%U). I haven't found a better way yet, but definitely open to suggestions

CalebAlbers avatar Jan 20 '21 08:01 CalebAlbers

My approach to the problem:

  1. pull dependent images before cache action
  2. Build hash from major changers of the docker image

YARN=$(md5sum yarn.lock | awk '{ print $1 }')
PKG=$(md5sum package.json | awk '{ print $1 }')
API_PKG=$(md5sum apps/api/package.json | awk '{ print $1 }')
TYPES_PKG=$(md5sum packages/types/package.json | awk '{ print $1 }')
CLIENT_PKG=$(md5sum packages/client/package.json | awk '{ print $1 }')

echo "YARN_HASH=${YARN}_${PKG}_${API_PKG}_${TYPES_PKG}_${CLIENT_PKG}" >> $GITHUB_ENV
  1. Use hash in key for cache
  2. Prune images before cache upload
      - run: |
          docker image prune -a --force --filter "label=tag!=${{ github.sha }}"

🤞

patroza avatar Jun 27 '21 11:06 patroza

How is anybody even using this action if the cache continually grows with each build?

It seems this action will always make build times worse after the first handful of builds... Am I missing something?

adambiggs avatar Dec 01 '21 22:12 adambiggs

@adambiggs you should have a build step that cleans up images -- at least that's what we do. We prune images older than three days so we can still leverage the cache without having it be astronomical in size.

omacranger avatar Dec 01 '21 22:12 omacranger

Thanks @omacranger. For anyone who might find themselves here, the workaround I ended up with is adding this step at the end of my job:

- run: docker image prune --all --force --filter "until=48h"

I think a note should really be added to the readme, because some flavour of this workaround seems to be a hard requirement for using this action.

adambiggs avatar Dec 02 '21 01:12 adambiggs

i've found it much quicker to download the most recently(/similar) built image and use --cache-from. i'm not sure if there are other cases where this layer caching solution is cheaper.

jakeonfire avatar Dec 03 '21 05:12 jakeonfire