github-action-setup-ddev icon indicating copy to clipboard operation
github-action-setup-ddev copied to clipboard

RFC: Is it possible to cache the pulled docker images?

Open nostadt opened this issue 3 years ago • 7 comments

I don't necessarily want to cache the whole ddev setup but to speed up the GH workflow it would be neat to have the docker images cached.

Do you think it's possible and if yes, something one actually wants? I am kinda new to the CI workflow etc.

nostadt avatar Nov 23 '20 20:11 nostadt

It sure would help. CircleCI provides that capability. I'll bet Github has something up their sleeves now or later.

I should note though that having watched this a lot of times in the last few days... the download is super fast. The extraction, which I don't think caching would help, is what takes a little more time.

rfay avatar Nov 23 '20 20:11 rfay

Ah I see. Thanks for the input.

nostadt avatar Nov 23 '20 20:11 nostadt

I've added a this to Github Actions

      - uses: jonaseberle/github-action-setup-ddev@v1
        with:
          autostart: true

This step taking 1min 30s - 1min 50s seconds to run. Anyone found a caching solution for Github Actions?

tyler36 avatar Oct 07 '21 01:10 tyler36

A search does show some possibilities.

  • https://github.com/marketplace/actions/docker-layer-caching

rfay avatar Oct 07 '21 02:10 rfay

Thank you for the update.

I added the following line to my Github actions:

+      - uses: satackey/[email protected]
+         continue-on-error: true

      - uses: jonaseberle/github-action-setup-ddev@v1
        with:
          autostart: true

Which reduced the jonaseberle/github-action-setup-ddev@v1 step to around 40s; around 50%!

Unfortunately, the satackey/[email protected] step takes around 40s to run; so it works out to be the same.

I'm obviously missing something.

tyler36 avatar Oct 07 '21 06:10 tyler36

Nice, it uses actions/cache (https://github.com/satackey/action-docker-layer-caching/blob/main/src/LayerCache.ts).

Not sure if it kind of forfeits docker pull logic by putting restored images in the local store so newer images wouldn't be automatically loaded or if it caches docker's cache so docker pull can do its magic. -> EDIT: On a second thought: docker pull checks upstream for image hashes so it should work just fine and fetch newer ones if available.

Also I remember that in the early days Github's cache size was too limited to hold ddev's images. But we have 5GB now https://docs.github.com/en/actions/advanced-guides/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy.

Worth a try :)

jonaseberle avatar Oct 07 '21 07:10 jonaseberle

On the other hand, docker already stores images by layers in /var/lib/docker/image (default - haven't checked in Github action). Couldn't we just use actions/cache on that directory?

jonaseberle avatar Oct 07 '21 08:10 jonaseberle

GitHub now supports up to 10GB cached per repo.

I got this working using https://github.com/ScribeMD/docker-cache as a proof of concept. Unfortunately, this is actually significantly slower than without a cache. I didn't get the cache keys working fully, but this was enough to test performance.

    - name: Cache Docker images.
       uses: ScribeMD/[email protected]
       with:
 #        key: docker-${{ runner.os }}-1.21.4-${{ hashFiles('.ddev/config.multisite.yaml', '.ddev/config.yaml', '.ddev/docker-compose.redis.yaml', '.ddev/docker-compose.selenium.yaml', '.ddev/docker-compose.zap.yaml', '.ddev/web-build') }}
          key: docker-${{ runner.os }}-1.21.4

     - name: Install ddev
       ...

Without the cache it takes around 3m20s to install ddev, and with nearly 5 minutes. I think this is because the cache forces downloading the images as one step, and then a docker load as a separate step in serial.

I'm hesitant about the idea of caching /var/lib/docker/... as I think those contents are "internal" to docker, and at the least would require us to shutdown the daemon. As well, the above action handles the fact that runners come with pre-cached images that would need to be excluded.

Two further ideas:

  1. Is it any faster to download from ghcr as compared to Docker hub?
  2. Are we maxing out network and IO by pulling images via ddev start? I think that works one image at a time. It's possible that the runner could pull multiple images at once for greater performance?

In our case, we have several big images (Selenium, ZAP's full version) and I think we will end up focusing on reducing their size or eliminating them. For example, we don't need selenium on our static test job, and we only need ZAP on the security scanning job, so we'll likely convert them to optional services and enable them with ddev service enable....

deviantintegral avatar Jan 10 '23 15:01 deviantintegral

My own experience is that on a direct-internet-connected, the download is less costly than the extraction, so extracting a tarball of images doesn't help a lot, but it can prevent internet-related errors.

rfay avatar Jan 10 '23 15:01 rfay

Hey, we have moved to https://github.com/ddev/github-action-setup-ddev. This repo will be archived (read-only). I am going to close all issues. Please feel free to open a new one in the new place if you think it makes sense.

jonaseberle avatar Apr 26 '23 06:04 jonaseberle