kaniko
kaniko copied to clipboard
build hanging at 'Unpacking rootfs as cmd RUN mkdir ... requires it'
Actual behavior
When using gcr.io/kaniko-project/executor:debug and
running /kaniko/executor --context $CI_PROJECT_DIR --dockerfile Dockerfile --destination ${CONTAINER_IMAGE} in gitlab runner build hangs.
Expected behavior
Build finishes successfully and image getting published to gcr
To Reproduce Steps to reproduce the behavior:
Dockerfile
FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-1
RUN mkdir /tpu
COPY test.py /tpu/
.gitlab-ci.yml
stages:
- publish
variables:
CONTAINER_IMAGE: gcr.io/${GOOGLE_PROJECT_ID}/${CI_PROJECT_NAME}:${CI_COMMIT_SHORT_SHA}
publish:
stage: publish
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
script:
- /kaniko/executor --context $CI_PROJECT_DIR --dockerfile Dockerfile --destination ${CONTAINER_IMAGE} --verbosity=debug
# Custom Functions -------------------------------------------------------
.custom_functions: &custom_functions |
function config_kubernetes() {
kubectl config set-cluster $KUBE_NAME --server="$KUBE_URL" --insecure-skip-tls-verify=true
kubectl config set-credentials cluster-admin --username="$KUBE_USER" --password="$KUBE_PASSWORD"
kubectl config set-context default --cluster=$KUBE_NAME --user=cluster-admin
kubectl config use-context default
echo $GOOGLE_SERVICE_JSON > ./gcloud-service-key.json
kubectl create secret generic kaniko-secret --from-file=./gcloud-service-key.json
}
before_script:
- *custom_functions
Triage Notes for the Maintainers
Description | Yes/No |
---|---|
Please check if this a new feature you are proposing |
|
Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
Please check if your dockerfile is a multistage dockerfile |
|
It turned out it was not hanging it just keeps iterating over files and whiting them out for a very long time. It has never got to finish from the triggers.
DEBU[0002] Not adding /dev because it is whitelisted
DEBU[0002] Not adding /etc/hostname because it is whitelisted
DEBU[0002] Not adding /etc/hosts because it is whitelisted
DEBU[0002] Not adding /etc/resolv.conf because it is whitelisted
DEBU[0002] Not adding /proc because it is whitelisted
DEBU[0003] Not adding /sys because it is whitelisted
DEBU[0005] Not adding /var/run because it is whitelisted
DEBU[0006] Whiting out /etc/ImageMagick-6/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/X11/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/apache2/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/bash_completion.d/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/ca-certificates/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/cron.d/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/cron.hourly/.wh..wh..opq
ah got it. Thanks.
Is the base image gcr.io/deeplearning-platform-release/tf2-cpu.2-1
expected to contain so many white outs path?
On latest master, i was able to build your docker file in 4 mins. Can you remove the -v=debug
flag and see?
/ # /busybox/time kaniko/executor -f dockerfiles/Dockerfile1 --context=dir://workspace --destination=gcr.io/tejal-test/test-ml-latest-master
INFO[0000] Resolved base name gcr.io/deeplearning-platform-release/tf2-cpu.2-1 to gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0000] Using dockerignore file: /workspace/.dockerignore
INFO[0000] Resolved base name gcr.io/deeplearning-platform-release/tf2-cpu.2-1 to gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0001] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0001] Built cross stage deps: map[]
INFO[0001] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0002] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1
INFO[0002] Unpacking rootfs as cmd RUN mkdir /tpu requires it.
INFO[0102] Taking snapshot of full filesystem...
INFO[0105] Resolving paths
INFO[0186] RUN mkdir /tpu
INFO[0186] cmd: /bin/sh
INFO[0186] args: [-c mkdir /tpu]
INFO[0186] Taking snapshot of full filesystem...
INFO[0188] Resolving paths
INFO[0236] COPY test.py /tpu/
INFO[0236] Resolving paths
INFO[0236] Taking snapshot of files...
real 4m 0.38s
user 2m 11.51s
sys 2m 10.03s
I did a small optimization which reduced the time to 3 mins.
Hello, I am getting this same error.
[build-and-push] INFO[0002] Retrieving image manifest openjdk:8-jdk-alpine
[build-and-push] INFO[0002] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0003] Retrieving image manifest openjdk:8-jdk-alpine
[build-and-push] INFO[0003] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0005] Built cross stage deps: map[]
[build-and-push] INFO[0005] Retrieving image manifest openjdk:8-jdk-alpine
[build-and-push] INFO[0005] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0006] Retrieving image manifest openjdk:8-jdk-alpine
[build-and-push] INFO[0006] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0008] Executing 0 build triggers
[build-and-push] INFO[0008] Unpacking rootfs as cmd ADD target/spring-webflux*.jar spring-webflux-demo.jar requires it.
[build-and-push] INFO[0053] ENV LANG C.UTF-8
[build-and-push] INFO[0053] Resolving srcs [target/spring-webflux*.jar]...
[build-and-push] error building image: error building stage: failed to get files used from context: copy failed: no source files specified
container step-build-and-push has failed : [{"key":"StartedAt","value":"2020-08-23T16:44:35.577Z","resourceRef":{}}]
I am not sure why this error is coming up. Any input -
Same error here:
INFO[0000] Retrieving image manifest openjdk:11-jdk-slim
INFO[0000] Retrieving image openjdk:11-jdk-slim from registry [index.docker.io](http://index.docker.io/)
INFO[0006] Built cross stage deps: map[]
INFO[0006] Retrieving image manifest openjdk:11-jdk-slim
INFO[0006] Returning cached image manifest
INFO[0006] Executing 0 build triggers
INFO[0006] Unpacking rootfs as cmd ADD target/dynamic-service-*.jar /app/app.jar requires it.
using gcr.io/kaniko-project/executor@sha256:19b934353e409c72b7e71ad9018ed7ba4505682b81da87fb99c7b9dffdb4372a
Any ideas?
Update: after long time(about 20mins) it continues, but I don't know the reason.
same here:
gcr.io/kaniko-project/executor v1.8.1 a2a981eb8745 2 weeks ago 63.4MB
stuck on Unpacking
. My code repo is tiny. only some golang code.
INFO[0008] Unpacking rootfs as cmd COPY go.mod go.mod requires it.
DEBU[0008] Ignore list: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/var/run false} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/perf_event false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/net_cls,net_prio false} {/sys/fs/cgroup/blkio false} {/sys/fs/cgroup/cpu,cpuacct false} {/sys/fs/cgroup/freezer false} {/dev/mqueue false} {/workspace false} {/busybox false} {/kaniko/.docker false} {/dev/termination-log false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false} {/dev/shm false} {/var/run/secrets/kubernetes.io/serviceaccount false} {/proc/bus false} {/proc/fs false} {/proc/irq false} {/proc/sys false} {/proc/sysrq-trigger false} {/proc/acpi false} {/proc/kcore false} {/proc/keys false} {/proc/timer_list false} {/proc/timer_stats false} {/proc/sched_debug false} {/proc/scsi false} {/sys/firmware false}]
DEBU[0015] Not adding /dev because it is ignored
DEBU[0016] Not adding /etc/hostname because it is ignored
DEBU[0016] Not adding /etc/resolv.conf because it is ignored
DEBU[0030] Not adding /proc because it is ignored
DEBU[0036] Not adding /sys because it is ignored
DEBU[0193] Not adding /var/run because it is ignored
DEBU[0194] Whiting out /etc/ca-certificates/.wh..wh..opq
DEBU[0194] not including whiteout files
DEBU[0194] Whiting out /etc/ssl/.wh..wh..opq
DEBU[0194] not including whiteout files
DEBU[0198] Whiting out /usr/lib/sasl2/.wh..wh..opq
DEBU[0198] not including whiteout files
DEBU[0198] Whiting out /usr/lib/ssl/.wh..wh..opq
DEBU[0198] not including whiteout files
............ # lots of similar logs...........................
DEBU[0228] not including whiteout files
DEBU[0232] Whiting out /usr/share/doc/dirmngr/.wh..wh..opq
DEBU[0232] not including whiteout files
DEBU[0233] Whiting out /usr/share/doc/gnupg/.wh..wh..opq
DEBU[0233] not including whiteout files
my pod yaml as below
apiVersion: v1
kind: Pod
metadata:
name: kaniko
spec:
containers:
- name: kaniko
#image: gcr.io/kaniko-project/executor:debug
image: gcr.io/kaniko-project/executor:v1.8.1
args: ["--dockerfile=/workspace/docker/Dockerfile",
"--context=dir:///workspace",
"--cache=true",
"--verbosity=debug",
"--destination=***/****/****"] # replace with your dockerhub account
volumeMounts:
- name: kaniko-secret
mountPath: /kaniko/.docker
- name: dockerfile-storage
mountPath: /workspace
restartPolicy: Never
volumes:
- name: kaniko-secret
secret:
secretName: regcred
items:
- key: .dockerconfigjson
path: config.json
- name: dockerfile-storage
persistentVolumeClaim:
claimName: dockerfile-claim
rollback image to v1.8.0, issue remains the same.
But sometimes, it will not stuck forever, but report error error building image: error building stage: failed to getfilesystem from image: unexpected EOF
sooner:
......
DEBU[0028] Not adding /sys because it is ignored
error building image: error building stage: failed to get filesystem from image: unexpected EOF
now, I can reproduce this issue by simply adding two magic lines from tutorial (Step 1): add something in context
+ $ echo "Peter" > a.txt
$ echo 'FROM ubuntu' >> dockerfile
+ $ echo 'COPY a.txt a.txt' >> dockerfile
$ echo 'ENTRYPOINT ["/bin/bash", "-c", "echo hello"]' >> dockerfile
(step 2): add "--cache=true",
as kaniko parameter. ( not sure it will be the cause... even without cache, there's still some chance to stuck or fail at the end(error building image: error building stage: failed to get filesystem from image: unexpected EOF
)... issue not 100% reproducible. but ~75% . )
And adding ``--snapshotmode=redo/time` does no help..
@tejal29 would you revisit this issue again ? maybe P1 priority ? seems it makes kaniko unusable ...
Same here, it takes forever when it says " Unpacking rootfs as cmd COPY as .... needs it". I'm using --cache=true and --cache-copy-layers flags.
Similar reports are also in https://github.com/GoogleContainerTools/kaniko/issues/763 if that helps. I'm also seeing the unexpected EOF
error but it seems like it mostly happens in GitLab CI, not locally when trying to reproduce it running in a v1.8.1-debug
container.
In my case was with gitlab also, it seems that if you specify --compressed-caching=false things go much faster in addiction to --cache and --copy-layers
I'm seeing the same issue building a python:3.8 based image in GitLab CI. The build fails at random intervals with
INFO[0004] Unpacking rootfs as cmd COPY ./ . requires it.
error building image: error building stage: failed to get filesystem from image: unexpected EOF
running docker image prune -a
on the host seems to resolve the issue temporarily.
Because this happens only to some of our projects that do COPY . .
, I'm starting to wonder whether some files in the repo (maybe in combination with .dockerignore
, not sure) are tripping kaniko up. Are other people seeing this on specific projects only as well?
And just a random question, do people who experience this use xfs
for the backing filesystem? I ask because I can't reproduce the issue locally, only happens in gitlab ci.
Because this happens only to some of our projects that do
COPY . .
, I'm starting to wonder whether some files in the repo (maybe in combination with.dockerignore
, not sure) are tripping kaniko up. Are other people seeing this on specific projects only as well?
I only have 2 projects using Kaniko, both have a COPY
command in the Dockerfile
, however, one is copying a specific file (an SQL dump which is actually created by a previous job), the other is copying the entier repo (with exclusions in .dockerignore
)
getting same issue with confluentinc/cp-kafka-connect-base:7.2.0 on 3rd run
I got this in debug:
INFO[0075] Unpacking rootfs as cmd COPY --from=galaxy /usr/share/ansible /usr/share/ansible requires it.
DEBU[0075] Ignore list: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/var/run false} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/cpu,cpuacct false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/net_cls,net_prio false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/perf_event false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/freezer false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/blkio false} {/dev/mqueue false} {/dev/shm false} {/cache false} {/builds false} {/busybox false} {/certs/client false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false}]
DEBU[0075] Not adding /proc because it is ignored
DEBU[0075] Not adding /etc/hosts because it is ignored
DEBU[0075] Not adding /etc/mtab because it is ignored
DEBU[0075] Not adding /etc/resolv.conf because it is ignored
DEBU[0075] Not adding /etc/hostname because it is ignored
DEBU[0075] Not adding /dev because it is ignored
DEBU[0075] Not adding /sys because it is ignored
DEBU[0077] Not adding /var/run because it is ignored
error building image: error building stage: failed to get filesystem from image: unexpected EOF
after second 77 it took kaniko another 5 minutes or so to throw that error.
And just a random question, do people who experience this use xfs for the backing filesystem? I ask because I can't reproduce the issue locally, only happens in gitlab ci.
@nejch Our runner also has XFS
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
this error still happens in some of our pipelines. there is no COPY command on the dockerfile. sometimes it works, sometimes if fails.
Facing the same problem...
The same problem with v1.9.1-debug
and v1.9.0-debug
on gitlab pipeline.
- no cache and other flag
- using ext4
- include a ADD and a COPY in order, stuck in ADD
- randomly
- override entrypoints
- randomly
- when stuck, it will not failed, just spend a long time to finish
- gitlab runner with kubernetes executor
build fail with image v1.9.1-debug
Counting objects: 100% (23/23), done.
Compressing objects: 100% (20/20), done.
Total 23 (delta 5), reused 21 (delta 3), pack-reused 0
INFO[0002] Retrieving image manifest golang:latest
INFO[0002] Retrieving image golang:latest from registry [index.docker.io](http://index.docker.io/)
INFO[0004] Built cross stage deps: map[]
INFO[0004] Retrieving image manifest golang:latest
INFO[0004] Returning cached image manifest
INFO[0004] Executing 0 build triggers
WARN[0004] maintainer is deprecated, skipping
INFO[0004] Building stage ‘golang:latest’ [idx: ‘0’, base-idx: ‘-1’]
INFO[0004] Unpacking rootfs as cmd RUN mkdir -p /www/webapp requires it.
error building image: error building stage: failed to get filesystem from image: unexpected EOF
Same here:
-
kaniko image version:
gcr.io/kaniko-project/executor:v1.12.0-debug
-
command:
/kaniko/executor --context dir://$PWD --dockerfile $PWD/Dockerfile --cache=true --cache-dir=/cache --destination=${image_name} --skip-unused-stages --use-new-run --snapshot-mode=redo --verbosity=info
-
error log:
[36mINFO[0m[0007] Unpacking rootfs as cmd COPY nginx.conf /etc/nginx/http.d requires it.
error building image: error building stage: failed to get filesystem from image: http2: server sent GOAWAY and closed the connection; LastStreamID=7, ErrCode=NO_ERROR, debug=""
Maybe somebody can verify - does it also happen if you change the user?
#./Dockerfile
User someotheruser # change from root to other user
Same here:
- kaniko image version: gcr.io/kaniko-project/executor:v1.12.0-debug
- env: GitLab CI
- Base image:
amazonlinux:2022
- command:
/kaniko/executor --cleanup --build-arg FROM_TAG=2.10.5 --build-arg TAG=2.10.5 --custom-platform=linux/amd64 --context=/builds/common/[MASKED]/dockerfiles --dockerfile=external/devonlab/jjobs/Dockerfile --destination=${DESTINATION}
- Dockerfile:
COPY assets/certs/ /etc/pki/ca-trust/source/anchors/
- Time: Approximately 30 minutes
i faced the same error
GitLab CE 13.2.1
kaniko version: kaniko/executor:v1.13.0-debug
command:
/kaniko/executor --context $CI_PROJECT_DIR --insecure --insecure-pull --skip-tls-verify --compressed-caching=false --build-arg IMAGE=base_py/3.11.1/poppler --build-arg IMAGE_TAG=1.0.519 --build-arg DOCKER_REGISTRY=${DOCKER_REGISTRY}/ --build-arg PIP_REGISTRY=https://${PYPI_USERNAME}:${PYPI_PASSWORD}@${NEXUS_PYPI_REPO} --build-arg GIT_HOST=${GIT_HOST} --build-arg GIT_USER=${GIT_USER} --build-arg GIT_PASS=${GIT_PASS} --build-arg GIT_BRANCH=${FIRST_BRANCH} --build-arg GIT_FEATURE_BRANCH= --build-arg USER_UID=${USER_UID} --build-arg SYSTEM_PREFIX=${SYSTEM_PREFIX} --build-arg SYSTEM_NAME=${SYSTEM_NAME} --dockerfile $CI_PROJECT_DIR/Dockerfile --destination ${DOCKER_INTERNAL_REGISTRY}/${DOCKER_IMAGE}
pipeline log:
INFO[0000] Unpacking rootfs as cmd COPY ./Pipfile* /usr/app/ requires it.
error building image: error building stage: failed to get filesystem from image: unexpected EOF
Cleaning up project directory and file based variables
00:04
ERROR: Job failed: exit code 1
in my opinion, the error occurs when two runners installed on the same host simultaneously start building images, but still the error occurs randomly
same error 2024
Seems that race condition occuring, when I'm trying to observe with strace build successfully finishing, but when build stuck, connections to pid with strace shows
user@node-with-kubernetes-executor:~$ sudo strace -p 2871077
strace: Process 2871077 attached
futex(0x27a1148, FUTEX_WAIT_PRIVATE, 0, NULL
Building with -race parameter and cgo enabled does the trick, but you know golang race detector overhead