Caching with the combination of COPY --from with RUN
Actual behavior
kaniko uses the cached version of a RUN chown ... /app/build command although the copied source has changes in the previous command COPY --from=builder /app/build /app/build and it shouldn't be using caches starting from this command.
As a result, in the image we have the changed copied source that have been overrided by the cache from the RUN cmd.
This behaviour has been confirmed using the tool dive.
Expected behavior
As it is stated in the kaniko README.md:
Note that kaniko cannot read layers from the cache after a cache miss: once a layer has not been found in the cache, all subsequent layers are built locally without consulting the cache.
The command COPY --from=builder /app/build /app/build has detected changes.
Thus, the cache must have been invalidated for the subsequent commands such as RUN chown ... /app/build
To Reproduce
Steps to reproduce the behavior:
- Have a Dockerfile with two stages (1st stage act as a builder; In the 2nd stage have
COPY --from=builder /app/build /app/buildcmd of your build app followed by aRUN chown ... /app/buildcmd) - Build the docker image with kaniko with
--cache=trueto create caches. - Apply a small change in the app (ex: change a wording, "delete" => "deleteA")
- Build the docker image with kaniko.
- The small change will not be apply. kaniko will detect changes in the
COPY --frombut it will be overrided by the cache ofRUN chowncmd.
Additional Information
Information about the build of the docker image:
We use gitlab-runner to launch kaniko. Our application is a front app (vue.js/typescript). We use webpack (v4) to build our app. Version of kaniko: 1.5.1 Kaniko's image: gcr.io/kaniko-project/executor:debug Flags used: --cache=true --context PROJECT_DIR --dockerfile PATH_DOCKERFILE --destination REGISTRY:TAG The Docker Image is pushed to our Gitlab Container Registry.
The command:
/kaniko/executor --cache=true --context PROJECT_DIR --dockerfile PATH_DOCKERFILE --destination REGISTRY:TAG
Dockerfile:
Our Dockerfile with 2 stages:
# Stage 1 - builder
FROM node:14.16-alpine as node-builder
WORKDIR /app
RUN apk add --no-cache bash=5.0.11-r1 python=2.7.18-r0 make=4.2.1-r2 g++=9.3.0-r0
COPY package.json yarn.lock ./
RUN yarn install
COPY .env.defaults .
COPY docker/create_env_list.sh docker/startup.sh.tpl ./docker/
RUN yarn build:startup:docker # <----- alias of this command "bash ./docker/create_env_list.sh"
# COPY our front app to /app
COPY . .
# Build our front app. It will create an dist directory with static css & js
RUN yarn build:docker # <---- alias of this command "webpack --config configuration/webpack/webpack.config.docker.js"
#Stage 2 - the final image
FROM bitnami/nginx:1.19.7
# Use root user to add packages or do others actions
USER root
RUN apt-get update && apt-get install --no-install-recommends -y \
gettext-base=0.19.8.1-9 \
&& rm -rf /var/lib/apt/lists/*
# This command will copy the content of the dist folder (result of our build app).
COPY --from=node-builder /app/dist /usr/share/nginx/html
# Copy the script to be executed when running the image
COPY --from=node-builder /app/docker/startup.sh /app/startup.sh
# Copy the nginx configuration file
COPY /docker/default.conf /opt/bitnami/nginx/conf/server_blocks/default.conf
# Change owner to allow the script startup.sh to execute correctly
RUN chown -R 1001:1001 /usr/share/nginx/html # <---- HERE IS THE PROBLEM
# Change to bitnami/nginx's non-root user
USER 1001
EXPOSE 8080
# The startup.sh script will modify the js files by replacing the values of env variables. Then it will launch nginx -g "daemon off;"
CMD ["./startup.sh"]
Output:
Here is the kaniko's output of the strange behavior when we build the image:
INFO[0000] Resolved base name node:14.16-alpine to node-builder
INFO[0000] Using dockerignore file: /builds/GROUP/PROJECT/.dockerignore
INFO[0000] Retrieving image manifest node:14.16-alpine
INFO[0000] Retrieving image node:14.16-alpine from registry index.docker.io
INFO[0001] Retrieving image manifest node:14.16-alpine
INFO[0001] Returning cached image manifest
INFO[0001] Retrieving image manifest bitnami/nginx:1.19.7
INFO[0001] Retrieving image bitnami/nginx:1.19.7 from registry index.docker.io
INFO[0002] Retrieving image manifest bitnami/nginx:1.19.7
INFO[0002] Returning cached image manifest
INFO[0002] Built cross stage deps: map[0:[/app/dist /app/docker/startup.sh]]
INFO[0002] Retrieving image manifest node:14.16-alpine
INFO[0002] Returning cached image manifest
INFO[0002] Retrieving image manifest node:14.16-alpine
INFO[0002] Returning cached image manifest
INFO[0002] Executing 0 build triggers
INFO[0002] Checking for cached layer GITLAB_REGISTRY/cache:87cc5744e7ea93cd8ec36307bdcf28ce803ff475dc98f1fdfc0242d08eaadcb9...
INFO[0002] Using caching version of cmd: RUN apk add --no-cache bash=5.0.11-r1 python=2.7.18-r0 make=4.2.1-r2 g++=9.3.0-r0
INFO[0002] Checking for cached layer GITLAB_REGISTRY/cache:6eb87e79a3c23a83054f69e39e2a2f22d5f5046f8a87ae86255a6db912193492...
INFO[0002] Using caching version of cmd: RUN yarn install
INFO[0002] Checking for cached layer GITLAB_REGISTRY/cache:32efb1ddee11b3465dcafe4f487e09fb3b485968b78b89af4d33e56caca9069f...
INFO[0002] Using caching version of cmd: RUN yarn build:startup:docker
INFO[0002] Checking for cached layer GITLAB_REGISTRY/cache:6331910e7a49c38604451be13d38197aab830338d55b7db27598b41ed2545486...
INFO[0003] No cached layer found for cmd RUN yarn build:docker
INFO[0003] Unpacking rootfs as cmd COPY package.json yarn.lock ./ requires it.
INFO[0005] WORKDIR /app
INFO[0005] cmd: workdir
INFO[0005] Changed working directory to /app
INFO[0005] Creating directory /app
INFO[0005] Taking snapshot of files...
INFO[0005] RUN apk add --no-cache bash=5.0.11-r1 python=2.7.18-r0 make=4.2.1-r2 g++=9.3.0-r0
INFO[0005] Found cached layer, extracting to filesystem
INFO[0007] COPY package.json yarn.lock ./
INFO[0007] Taking snapshot of files...
INFO[0007] RUN yarn install
INFO[0007] Found cached layer, extracting to filesystem
INFO[0033] COPY .env.defaults .
INFO[0033] Taking snapshot of files...
INFO[0033] COPY docker/create_env_list.sh docker/startup.sh.tpl ./docker/
INFO[0033] Taking snapshot of files...
INFO[0033] RUN yarn build:startup:docker
INFO[0033] Found cached layer, extracting to filesystem
INFO[0034] COPY . .
INFO[0034] Taking snapshot of files...
INFO[0034] RUN yarn build:docker
INFO[0034] Taking snapshot of full filesystem...
INFO[0058] cmd: /bin/sh
INFO[0058] args: [-c yarn build:docker]
INFO[0058] Running: [/bin/sh -c yarn build:docker]
yarn run v1.22.5
$ webpack --config configuration/webpack/webpack.config.docker.js
Hash: 64ba52216d5e09ab18e6
Version: webpack 4.46.0
Time: 91865ms
Built at: 03/11/2021 1:55:12 PM
###
I REMOVED WEBPACK LOGS BECAUSE IT IS TOO LONG
###
INFO[0151] Taking snapshot of full filesystem...
Done in 92.83s.
INFO[0154] Pushing layer GITLAB_REGISTRY/cache:6331910e7a49c38604451be13d38197aab830338d55b7db27598b41ed2545486 to cache now
INFO[0154] Pushing image to GITLAB_REGISTRY/cache:6331910e7a49c38604451be13d38197aab830338d55b7db27598b41ed2545486
INFO[0156] Pushed image to 1 destinations
INFO[0156] Saving file app/dist for later use
INFO[0156] Saving file app/docker/startup.sh for later use
INFO[0156] Deleting filesystem...
INFO[0161] Retrieving image manifest bitnami/nginx:1.19.7
INFO[0161] Returning cached image manifest
INFO[0161] Retrieving image manifest bitnami/nginx:1.19.7
INFO[0161] Returning cached image manifest
INFO[0161] Executing 0 build triggers
INFO[0161] cmd: USER
INFO[0161] Checking for cached layer GITLAB_REGISTRY/cache:6b352e3e04060767598eec8d7bdfcb24f86c8df70a9a2156f25d8e0e4e48953b...
INFO[0161] Using caching version of cmd: RUN apt-get update && apt-get install --no-install-recommends -y gettext-base=0.19.8.1-9 && rm -rf /var/lib/apt/lists/*
INFO[0161] Checking for cached layer GITLAB_REGISTRY/cache:c1f577ccff31f67220be47e32061383e38d840bdc83e92e55632307b992ced5a...
INFO[0161] Using caching version of cmd: RUN chown -R 1001:1001 /usr/share/nginx/html
INFO[0161] cmd: USER
INFO[0161] cmd: EXPOSE
INFO[0161] Adding exposed port: 8080/tcp
INFO[0161] Unpacking rootfs as cmd COPY --from=node-builder /app/dist /usr/share/nginx/html requires it.
INFO[0164] USER root
INFO[0164] cmd: USER
INFO[0164] No files changed in this command, skipping snapshotting.
INFO[0164] RUN apt-get update && apt-get install --no-install-recommends -y gettext-base=0.19.8.1-9 && rm -rf /var/lib/apt/lists/*
INFO[0164] Found cached layer, extracting to filesystem
INFO[0164] COPY --from=node-builder /app/dist /usr/share/nginx/html
INFO[0164] Taking snapshot of files...
INFO[0164] COPY --from=node-builder /app/docker/startup.sh /app/startup.sh
INFO[0164] Taking snapshot of files...
INFO[0164] COPY /docker/default.conf /opt/bitnami/nginx/conf/server_blocks/default.conf
INFO[0164] Taking snapshot of files...
INFO[0164] RUN chown -R 1001:1001 /usr/share/nginx/html
INFO[0164] Found cached layer, extracting to filesystem
INFO[0164] USER 1001
INFO[0164] cmd: USER
INFO[0164] No files changed in this command, skipping snapshotting.
INFO[0164] EXPOSE 8080
INFO[0164] cmd: EXPOSE
INFO[0164] Adding exposed port: 8080/tcp
INFO[0164] No files changed in this command, skipping snapshotting.
INFO[0164] CMD ["./startup.sh"]
INFO[0164] No files changed in this command, skipping snapshotting.
INFO[0164] Pushing image to GITLAB_REGISTRY:ra-tech-kaniko-cache-90454622
INFO[0165] Pushed image to 1 destinations
Our Investigation:
My team and I investigated this issue to provide more information.
Step 1: Build a docker image with cache
On our pipeline, we first build the docker image with our current front app. Then, we change the wording in our application, ex: "delete" => "deleteA". A new docker image is built using for some parts the cache. Result: the wording "deleteA" isn't applied when deploying our app. The logs are above in the Output part.
Step 2: Extraction of the build app folder "dist/"
In the image produced from step 1, we extracted the files in the folder "dist" (result of the build using webpack).
The structure of the folder was:
dist
├── assets
│ └── LOT OF PNG FILES...
├── static
│ ├── css
│ │ ├── app.38ec1601a779bebf975a.css
│ │ ├── app.344c9445d1ea65f1bb97.css
│ │ └── vendors~app.b77cc233f46195f15a9b.css
│ └── js
│ ├── app.38ec1601a779bebf975a.js
│ ├── app.344c9445d1ea65f1bb97.js
│ ├── runtime~app.770e0fbb17939f06c767.js
│ ├── vendors~app.b77cc233f46195f15a9b.js
│ └── vendors~app.b77cc233f46195f15a9b.js.LICENSE.txt
└── index.html
We noticed that there were 2 app..css and app..jss in static/css & static/js. It was not normal.
We started to look inside the 2 app..js in static/js.
And we found something interesting, the two app..js were 2 versions of our front app, one with the wording "delete" and the other with the wording "deleteA".
We then investigated the index.html, and understood that the src used in this file were the app.*.js with the "delete" (not the one we wanted).
Step 3: Using dive to see what was happening in our image
We then started to explore our image with dive.
We saw that all the layers in the 1st stage was normal.
In the 2nd stage, we saw that the cmd COPY --from=node-builder /app/dist /usr/share/nginx/html did his job correctly by not using the cache.
Then in the layer of the cmd RUN chown -R 1001:1001 /usr/share/nginx/html, we saw that some files were been overrided (ex: index.html, runtime*.js, vendors*.js) and that the app.*.js (the previous version of our app with the wording "delete") was added in our build directory.
Conclusion: the RUN chown -R 1001:1001 /usr/share/nginx/html cmd was using the cache.
What we did
After that we discovered the cmd COPY --chown, we tried to use it but it didn't work as stated in this issue #1456 (I added a comment in this issue too).
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
We faced exactly the same issue. RUN chown -R ... uses cache even when it must not.
Luckily COPY --chown is fixed in kaniko https://github.com/GoogleContainerTools/kaniko/pull/1477 and seems to be working as expected! 🎉
So instead of
COPY --from=builder /release/ $APP_HOME/
RUN chown -R $APP_USER:$APP_GROUP $APP_HOME/
we run
COPY --chown=$APP_USER:$APP_GROUP --from=builder /release/ $APP_HOME/
I experienced a similar issue when downloading a file using ADD (which should run every build) followed by RUN chmod +x file_name. The cache was used for the RUN command, which caused a previous version of the downloaded file to be used in the image.
I'm not sure if it's possible to use ADD --chmod with kaniko but I'm going to try it to resolve my caching problem
Looking into the codebase, it seems that the invalidation of COPY command needs to be added at: pkg/commands/copy.go#L152 where it previously doesn't invalidate the COPY command unless a cache key miss is hit.
I had a chance to learn from @aaron-prindle that from previous design discussion and decisions made, there are currently only supports for COPY --chown and ADD --chown but not RUN --chown. Please see #1 for more details.
https://github.com/GoogleContainerTools/kaniko/issues/3018 is raised as a new feature request and https://github.com/GoogleContainerTools/kaniko/pull/3019 for documentation clarification.
I will go ahead and close this issue for now and please redirect to #3018
Please feel free to reopen if necessary.
I think this issue should not have been closed.
Not invalidating the cache for the subsequent RUN command after a previous COPY or ADD command resulted in changes to the filesystem is a bug.
It is a bug regardless of whether a feature like https://github.com/GoogleContainerTools/kaniko/issues/3018 exists that allows to work around the bug in some cases.
I think this issue should not have been closed.
Not invalidating the cache for the subsequent
RUNcommand after a previousCOPYorADDcommand resulted in changes to the filesystem is a bug.It is a bug regardless of whether a feature like #3018 exists that allows to work around the bug in some cases.
Thanks @felixhuttmann for chiming in. I reopened this for more discussion.
From the previous use cases, the bug of invalidated cache being used happens where the file might not previously have the expected permissions.
Could you confirm if the case still persists without RUN --chown?
@felixhuttmann would appreciate it if you don't mind following up on this with more evidence?
@JeromeJu After trying again, I was not able to reproduce this. Feel free to close this issue again. Sorry for the noise.
I was confused, because you wrote above
Looking into the codebase, it seems that the invalidation of COPY command needs to be added at: pkg/commands/copy.go#L152 where it previously doesn't invalidate the COPY command unless a cache key miss is hit.
but I did not see that any code there recently changed, and this issue also does not link to any MR where a fix was performed. Perhaps this issue was fixed as part of something else in the meantime.
I think we just ran into this issue:
Stripped down Dockerfile:
FROM buildpack-deps:bookworm as base
RUN useradd --user-group --create-home --shell /bin/bash user
SHELL ["/bin/bash", "-c"]
RUN mkdir -p /workdir
RUN mkdir -p /workdir && chown user:user /workdir
WORKDIR /workdir
USER user
# … (install NodeJS/npm/pnpm here)
COPY --chown=user:user ./pnpm-lock.yaml .
# ------------------------------------------------
FROM base AS e2e_browser_dependencies_prepare
WORKDIR /workdir
USER user
COPY --chown=user:user pnpm-lock.yaml .
RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version
# ------------------------------------------------
FROM base AS e2e_browser_dependencies
WORKDIR /workdir
USER user
COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version .
# The following two RUN commands get retrieved from cache even when pnpm-lock.yaml and playwright_version have changed!
RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"'
RUN npx playwright install chromium
The Kaniko logs in our Gitlab job show that RUN grep … gets re-executed but RUN VERSION_FROM_PACKAGE_JSON=… and RUN npx playwright install chromium are still retrieved from cache, even though they depend on the outcome of RUN grep …:
INFO[0159] Checking for cached layer […]
INFO[0159] No cached layer found for cmd RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version
INFO[0159] Unpacking rootfs as cmd COPY --chown=user:user pnpm-lock.yaml . requires it.
INFO[0175] USER user
INFO[0175] Cmd: USER
INFO[0175] No files changed in this command, skipping snapshotting.
INFO[0175] WORKDIR /workdir
INFO[0175] Cmd: workdir
INFO[0175] Changed working directory to /workdir
INFO[0175] No files changed in this command, skipping snapshotting.
INFO[0175] COPY --chown=user:user pnpm-lock.yaml .
INFO[0175] Taking snapshot of files...
INFO[0175] RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version
INFO[0176] Cmd: /bin/bash
INFO[0176] Args: [-c grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version]
[…]
INFO[0197] COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version .
INFO[0197] Taking snapshot of files...
INFO[0197] RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"'
INFO[0197] Found cached layer, extracting to filesystem
INFO[0198] RUN npx playwright install chromium
INFO[0198] Found cached layer, extracting to filesystem
I think we just ran into this issue:
Dockerfile:
FROM buildpack-deps:bookworm as base RUN useradd --user-group --create-home --shell /bin/bash user SHELL ["/bin/bash", "-c"] RUN mkdir -p /workdir RUN mkdir -p /workdir && chown user:user /workdir WORKDIR /workdir USER user # … (install NodeJS/npm/pnpm here) COPY --chown=user:user ./pnpm-lock.yaml . FROM base AS e2e_browser_dependencies_prepare WORKDIR /workdir USER user COPY --chown=user:user pnpm-lock.yaml . RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version FROM base AS e2e_browser_dependencies WORKDIR /workdir USER user COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version . # The following two RUN commands get retrieved from cache even when pnpm-lock.yaml and playwright_version have changed! RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"' RUN npx playwright install chromiumThe Kaniko logs show that
RUN grep …gets re-executed butRUN VERSION_FROM_PACKAGE_JSON=…andRUN npx playwright install chromiumare still retrieved from cache:INFO[0159] Checking for cached layer […] INFO[0159] No cached layer found for cmd RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version INFO[0159] Unpacking rootfs as cmd COPY --chown=user:user pnpm-lock.yaml . requires it. INFO[0175] USER user INFO[0175] Cmd: USER INFO[0175] No files changed in this command, skipping snapshotting. INFO[0175] WORKDIR /workdir INFO[0175] Cmd: workdir INFO[0175] Changed working directory to /workdir INFO[0175] No files changed in this command, skipping snapshotting. INFO[0175] COPY --chown=user:user pnpm-lock.yaml . INFO[0175] Taking snapshot of files... INFO[0175] RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version INFO[0176] Cmd: /bin/bash INFO[0176] Args: [-c grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version] […] INFO[0197] COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version . INFO[0197] Taking snapshot of files... INFO[0197] RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"' INFO[0197] Found cached layer, extracting to filesystem INFO[0198] RUN npx playwright install chromium INFO[0198] Found cached layer, extracting to filesystem
Thanks @codethief for your inputs.
From your Dockerfile, it looks like Copy -chown has worked as expected, which does not retrieve from cache. Could you confirm if the issue was only related with COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version . command and not from previous?
@JeromeJu
From your Dockerfile, it looks like Copy -chown has worked as expected
Indeed, good point.
Could you confirm if the issue was only related with COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version . command and not from previous?
I'm not sure I'm following. The playwright_version file did get re-generated in the previous build stage (RUN grep … > playwright_version got re-executed according to the logs), so, from what I am seeing, the issue here seems to be with COPY --from=e2e_browser_dependencies_prepare /workdir/playwright_version not behaving as expected.
RUN grep --extended-regexp ' playwright: \d+\.\d+\.\d+' pnpm-lock.yaml | cut -d ':' -f 2 | cut -d ' ' -f 2 > playwright_version
Thanks for confirming that the COPY -chown worked as expected. By that I was trying to decouple the issues here in order to find the root cause for your use case given that this issue was raised under the circumstance where COPY --chown cache is not invalidated when used with RUN --chown together.
IIUC, in your use case would there be a workaround if you move RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"' after the RUN grep in the previous stage?
Thanks for looking into this, @JeromeJu – it's highly appreciated!
By that I was trying to decouple the issues here in order to find the root cause for your use case given that this issue was raised under the circumstance where COPY --chown cache is not invalidated when used with RUN --chown together.
I see. Yeah, it's probably not only COPY --chown that causes issues, but also COPY --from=previous_stage.
IIUC, in your use case would there be a workaround if you move RUN VERSION_FROM_PACKAGE_JSON=$(cat playwright_version) bash -c 'npm install "playwright@${VERSION_FROM_PACKAGE_JSON}"' after the RUN grep in the previous stage?
Unfortunately, this wouldn't work. I need the right NPM package to be installed in the last stage in order for the very last command npx playwright install chromium to work as expected.