Kaniko fails to execute multiple builds in same container
Actual behavior Kaniko fails to execute sequential builds in the same container. After the first build, the second build fails to start the kaniko command to build the image.
Expected behavior Kaniko should not fail during multiple image builds in same container.
To Reproduce Steps to reproduce the behavior:
- Create a Kaniko Pod with
image gcr.io/kaniko-project/executor:debug. - Go inside the pod and create a simple Dockerfile.
FROM ubuntu
apt-get update -y
- Now run Kaniko command to create image from this Dockerfile.
/kaniko/executor \
-f ./Dockerfile -c . \
--dockerfile Dockerfile \
--destination=<YOUR IMAGE REGISTRY>:test_1.0
- First execution will work perfectly.
- Now within the same container again run the same command with some different tag
/kaniko/executor \
-f ./Dockerfile -c . \
--dockerfile Dockerfile \
--destination=<YOUR IMAGE REGISTRY>:test_1.1
- Command will fail this time with an error
ERROR: Process exited immediately after creation. See output below
Additional Information
- Dockerfile
FROM ubuntu
apt-get update -y
- Kaniko Image
image gcr.io/kaniko-project/executor:debug
Fix/Workaround
- Kaniko as it seems like, is meant for a single execution and not reusing the same container for multiple image builds.
- At the end of execution, kaniko removes
workspacedirectory which makes it difficult for the next image build to execute image creation in the same container.
The workaround was to:
- Explicitly create workspace directory at the end of each build execution which makes the same container ready for the next build.
mkdir -p /workspace - Cleaning up the kaniko executor workspace by adding
--cleanuparg. - Remove dependencies of old build (symlinks) by
rm -rf /kaniko/0
/kaniko/executor \
-f ./Dockerfile -c . \
--dockerfile Dockerfile \
--destination=<YOUR IMAGE REGISTRY>:test_1.1
rm -rf /kaniko/0
mkdir -p /workspace
Expectation
- If some flag can be introduced to avoid
/workspacedirectory deletion after the execution of kaniko build command. something like--reuse-executor=true - If
rm -rf /kaniko/0can be handled from--cleanupflag itself.
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
I was not able to reproduce this. Used "Kaniko version : v1.16.0" from debug image docker run -it --entrypoint="" gcr.io/kaniko-project/executor:debug /bin/sh.
I used a slightly modified Dockerfile:
FROM ubuntu
RUN apt-get update -y
EDIT: Maybe it is caused by mem/disk space?
I'm experiencing the same problem, reproducible like this:
- Start the container with
docker run -it --rm --entrypoint="" -v ./:/tmp gcr.io/kaniko-project/executor:debug /bin/sh - Create a Dockerfile:
FROM node:18-bookworm
RUN apt-get update \
&& apt-get install -y wget gnupg1 ca-certificates procps libxss1 \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub > linux_signing_key.pub \
&& install -D -o root -g root -m 644 linux_signing_key.pub /etc/apt/keyrings/linux_signing_key.pub \
&& sh -c 'echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/linux_signing_key.pub] http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable git curl unzip python3 python3-venv libnss3-dev \
&& rm -rf /var/lib/apt/lists/* \
&& wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
&& chmod +x /usr/sbin/wait-for-it.sh
- Start image build with:
executor -f Dockerfile --destination test-img-1 --no-pushThis works fine. - Start image build again with:
executor -f Dockerfile --destination test-img-2 --no-pushThe second build fails with this error:
...
Need to get 1171 kB of archives.
After this operation, 5047 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main amd64 gnupg1 amd64 1.4.23-1.1+b1 [601 kB]
Get:2 http://deb.debian.org/debian bookworm/main amd64 gnupg1-l10n all 1.4.23-1.1 [553 kB]
Get:3 http://deb.debian.org/debian bookworm/main amd64 libxss1 amd64 1:1.2.3-1 [17.8 kB]
Fetched 1171 kB in 0s (7344 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
dpkg: unrecoverable fatal error, aborting:
unknown system group 'messagebus' in statoverride file; the system group got removed
before the override, which is most probably a packaging bug, to recover you
can remove the override manually with dpkg-statoverride
E: Sub-process /usr/bin/dpkg returned an error code (2)
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 100
@mama-wk Had similar issue, so running sed -i '/messagebus/d' /var/lib/dpkg/statoverride this before re-running the executor is necessary to solve this, but this is only part of the problem, what I recognized was any installation we do as part of Dockerfile instruction gets executed directly in the kaniko conatiner, so if a package already exists in the kaniko executor image already, the build will fail. for example - I customized the executor to include aws cli in the executor, but if I further use this executor for building an image from a dockerfile that has instruction of installing the aws cli, it fails.
if below instruction is present in a Dockerfile
RUN cd /tmp && \
curl -sk "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install
that I am building with the custom executor with pre-installed aws cli, it fails with
./aws/install
Found preexisting AWS CLI installation: /usr/local/aws-cli/v2/current. Please rerun install script with --update flag
Is this the expected behavior? Atleast for me it seems very strange I couldn't find any explanation as to why this happens. :( I do not know if I should raise a separate issue for this or not.
Hello after finding this issue here and trying a lot of things to find a generic fix for my case, I found what I think solves most my error cases. In my case this solved all the failing builds with the different dockerfiles of around 15 projects (not all were failing but the ones with dockerfiles with more stages were more prone to fail).
My use case of kaniko is inside a jenkins pipeline that is using kubernetes plugin to run jobs inside kubernetes agent pods. Those agents have defined 1 single kaniko container and my need was to build the image twice with that single kaniko container, once as a tar to scan it with Trivy (a tool to scan containers) and after some quality checks are met use again the kaniko container to just build the image again and upload it to ECR.
My solution was adding this to my first call of building the image as a tar: && rm -rf /kaniko/*[0-9]* && rm -rf /kaniko/Dockerfile && mkdir -p /workspace
Call ending like this.
/kaniko/executor -f 'pwd'/docker/Dockerfile -c 'pwd' --tar-path='pwd'/image.tar --single-snapshot --no-push --destination=image --cleanup && rm -rf /kaniko/*[0-9]* && rm -rf /kaniko/Dockerfile && mkdir -p /workspace
Not a huge kaniko user myself but found this /kaniko directory was filled with some files after the 1st execution as some people in this thread mentioned. those files were messing the next execution. Those commands after the 1st build remove those problematic files and second execution works as a charm.
Hope this helps other people that find this issue. Thanks.
My solution was adding this to my first call of building the image as a tar: && rm -rf /kaniko/[0-9] && rm -rf /kaniko/Dockerfile && mkdir -p /workspace
Thanks a lot for your feedback !
I added --cleanup && rm -rf /kaniko/*[0-9]* && rm -rf /kaniko/Dockerfile && mkdir -p /workspace to my Kaniko command and it finally fixed the issue for me too.
Thanks for the workaround! This was really helpful. In my case I had to add needed commands for removal or the workspace (rm) I had only 'busybox/cat' as command in my pod.yaml
I would also appreciate it if there is a flag which does the thing.
@ricardllop sorry for directly tagging you but we have similar environment, do you also stumble accross inconsistencies? like not every run is successful and fails with
/durable-13df746c/script.sh.copy: line 9: rm: not found