build-push-action icon indicating copy to clipboard operation
build-push-action copied to clipboard

Strange cache misshit when using `gha` cache.

Open maxisme opened this issue 2 years ago • 4 comments

Behaviour

Steps to reproduce this issue

  1. Setup an action like (see below)
  2. Create a go Dockerfile like:
...

COPY src/go.mod ./
COPY src/go.sum ./
RUN go mod download


COPY src ./
RUN --mount=type=cache,target=/root/.cache/go-build \
    go build -tags=static -o /cli && ls -al /cli
...
  1. Run the action

Expected behaviour

The RUN go mod download cache should not be invalidated

Actual behaviour

When editing a file inside src (and not go.sum or go.mod), RUN go mod download is invalidated. Well at least I think it is:

#20 [base 22/26] COPY src/go.sum ./
#20 CACHED
#30 [base 23/26] RUN go mod download
#30 sha256:fccecca06c0e928afeb80a372e92178cfe09778a2d41c53d690328cd9350920f 43.01kB / 43.01kB 0.1s done
#30 sha256:46e25f3fb1d9354adea3ee0ec11523a2d4bdb24c795548575f2a8a180f8ef285 2.53kB / 2.53kB 0.1s done
...
#30 DONE 24.2s

I am not quite sure what these sha256 lines actually mean (is it pulling from cache - if so why doesn't it say CACHE)?

However, when I do not edit a file inside src (e.g .github/...) all lines are marked as CACHED (with no sha256 download lines).

Configuration

      - uses: docker/setup-buildx-action@v1
      - name: Build & Push
        uses: docker/build-push-action@v2
        with:
          cache-from: type=gha,scope=${{ matrix.ecr_repo }}
          cache-to: type=gha,mode=max,scope=${{ matrix.ecr_repo }}
          push: true
          tags: "${{ env.tags }}"

I have cross posted on stackoverflow.

maxisme avatar Jan 19 '23 17:01 maxisme

#30 [base 23/26] RUN go mod download
#30 sha256:fccecca06c0e928afeb80a372e92178cfe09778a2d41c53d690328cd9350920f 43.01kB / 43.01kB 0.1s done
#30 sha256:46e25f3fb1d9354adea3ee0ec11523a2d4bdb24c795548575f2a8a180f8ef285 2.53kB / 2.53kB 0.1s done

This looks confusing I agree but following lines:

#30 sha256:fccecca06c0e928afeb80a372e92178cfe09778a2d41c53d690328cd9350920f 43.01kB / 43.01kB 0.1s done
#30 sha256:46e25f3fb1d9354adea3ee0ec11523a2d4bdb24c795548575f2a8a180f8ef285 2.53kB / 2.53kB 0.1s done

are actually the cache being pulled so it works fine. If you want to make sure the step is actually cached, you can set go mod download -x.

I recall we discussed about this @jedevc @tonistiigi. Wonder if we could have a prefix when blob are being downloaded?

crazy-max avatar Jan 31 '23 19:01 crazy-max

Hi @crazy-max, thanks for clarifying the meaning of the output for gha cache hits. I have encountered a similar configuration and output as OP. I have one additional question regarding the output.

Here's my output:

#7 [3/5] RUN pip install -r requirements.txt
#7 CACHED

#8 [4/5] COPY lib lib
#8 sha256:3e96127b72ea3914cb5040ebd19f6f3460ee4bbbd8b689ab35e8d4d50943ffbc 7.34MB / 205.57MB 0.2s
...
#8 sha256:3e96127b72ea3914cb5040ebd19f6f3460ee4bbbd8b689ab35e8d4d50943ffbc 205.57MB / 205.57MB 4.5s done
#8 extracting sha256:3e96127b72ea3914cb5040ebd19f6f3460ee4bbbd8b689ab35e8d4d50943ffbc 5.2s done
#8 DONE 18.3s

#9 [5/5] COPY app_code/ .
#9 DONE 2.4s

I have modified the code in the app_code directory, invalidating step 5. The lib directory in step 4 is only a few MBs, so I was initially confused as to why a 200MB blob was being pulled during that step. My 2 hypotheses are that either 1) the pip dependencies from step 3 are being pulled from the cache as part of step 4; or 2) a cached layer with steps 1 -> 4 is pulled as part of step 4 in preparation for step 5, which is not cached. Can you confirm whether I'm correct in either of these ideas? (not a docker expert)

Further clarification would be great, but I really just wanted to add this context so that the output could potentially be made even clearer if/when a change is contributed. Thanks!

NoahTK7 avatar Feb 06 '23 14:02 NoahTK7

@NoahTK7 Did you ever understand what's going on with those huge blobs?

partounian avatar Jan 17 '24 21:01 partounian

My 2 hypotheses are that either 1) the pip dependencies from step 3 are being pulled from the cache as part of step 4; o

Yes, the parent layers for the step need to be pulled as well, eg. your pip dependencies files. The result of the COPY is not just the files in lib but also the files that were already at the destination before the lib files were added.

tonistiigi avatar Jan 17 '24 22:01 tonistiigi