kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

build hanging at 'Unpacking rootfs as cmd RUN mkdir ... requires it'

Open petkovacs19 opened this issue 4 years ago • 30 comments

Actual behavior

When using gcr.io/kaniko-project/executor:debug and
running /kaniko/executor --context $CI_PROJECT_DIR --dockerfile Dockerfile --destination ${CONTAINER_IMAGE} in gitlab runner build hangs.

Expected behavior

Build finishes successfully and image getting published to gcr

To Reproduce Steps to reproduce the behavior:

Dockerfile

FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-1
RUN mkdir /tpu
COPY test.py /tpu/

.gitlab-ci.yml

stages:
  - publish

variables:
  CONTAINER_IMAGE: gcr.io/${GOOGLE_PROJECT_ID}/${CI_PROJECT_NAME}:${CI_COMMIT_SHORT_SHA}

publish:
  stage: publish
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile Dockerfile --destination ${CONTAINER_IMAGE} --verbosity=debug
    
# Custom Functions -------------------------------------------------------
.custom_functions: &custom_functions |

  function config_kubernetes() {
    kubectl config set-cluster $KUBE_NAME --server="$KUBE_URL" --insecure-skip-tls-verify=true
    kubectl config set-credentials cluster-admin --username="$KUBE_USER" --password="$KUBE_PASSWORD"
    kubectl config set-context default --cluster=$KUBE_NAME --user=cluster-admin
    kubectl config use-context default
    echo $GOOGLE_SERVICE_JSON > ./gcloud-service-key.json
    kubectl create secret generic kaniko-secret --from-file=./gcloud-service-key.json
  }
  
before_script:
  - *custom_functions

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [ ]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - [ ]

petkovacs19 avatar Feb 29 '20 22:02 petkovacs19

It turned out it was not hanging it just keeps iterating over files and whiting them out for a very long time. It has never got to finish from the triggers.

DEBU[0002] Not adding /dev because it is whitelisted
DEBU[0002] Not adding /etc/hostname because it is whitelisted DEBU[0002] Not adding /etc/hosts because it is whitelisted DEBU[0002] Not adding /etc/resolv.conf because it is whitelisted DEBU[0002] Not adding /proc because it is whitelisted
DEBU[0003] Not adding /sys because it is whitelisted
DEBU[0005] Not adding /var/run because it is whitelisted DEBU[0006] Whiting out /etc/ImageMagick-6/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/X11/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/apache2/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/bash_completion.d/.wh..wh..opq DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/ca-certificates/.wh..wh..opq DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/cron.d/.wh..wh..opq
DEBU[0006] not including whiteout files
DEBU[0006] Whiting out /etc/cron.hourly/.wh..wh..opq

petkovacs19 avatar Feb 29 '20 22:02 petkovacs19

ah got it. Thanks.

Is the base image gcr.io/deeplearning-platform-release/tf2-cpu.2-1 expected to contain so many white outs path?

On latest master, i was able to build your docker file in 4 mins. Can you remove the -v=debug flag and see?

/ # /busybox/time kaniko/executor -f dockerfiles/Dockerfile1 --context=dir://workspace --destination=gcr.io/tejal-test/test-ml-latest-master
INFO[0000] Resolved base name gcr.io/deeplearning-platform-release/tf2-cpu.2-1 to gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0000] Using dockerignore file: /workspace/.dockerignore 
INFO[0000] Resolved base name gcr.io/deeplearning-platform-release/tf2-cpu.2-1 to gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0001] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0001] Built cross stage deps: map[]                
INFO[0001] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0002] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-cpu.2-1 
INFO[0002] Unpacking rootfs as cmd RUN mkdir /tpu requires it. 
INFO[0102] Taking snapshot of full filesystem...        
INFO[0105] Resolving paths                              
INFO[0186] RUN mkdir /tpu                               
INFO[0186] cmd: /bin/sh                                 
INFO[0186] args: [-c mkdir /tpu]                        
INFO[0186] Taking snapshot of full filesystem...        
INFO[0188] Resolving paths                              
INFO[0236] COPY test.py /tpu/                           
INFO[0236] Resolving paths                              
INFO[0236] Taking snapshot of files...                  
real	4m 0.38s
user	2m 11.51s
sys	2m 10.03s

I did a small optimization which reduced the time to 3 mins.

tejal29 avatar Mar 18 '20 07:03 tejal29

Hello, I am getting this same error.

[build-and-push] INFO[0002] Retrieving image manifest openjdk:8-jdk-alpine [build-and-push] INFO[0002] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0003] Retrieving image manifest openjdk:8-jdk-alpine [build-and-push] INFO[0003] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0005] Built cross stage deps: map[]
[build-and-push] INFO[0005] Retrieving image manifest openjdk:8-jdk-alpine [build-and-push] INFO[0005] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0006] Retrieving image manifest openjdk:8-jdk-alpine [build-and-push] INFO[0006] Retrieving image openjdk:8-jdk-alpine
[build-and-push] INFO[0008] Executing 0 build triggers
[build-and-push] INFO[0008] Unpacking rootfs as cmd ADD target/spring-webflux*.jar spring-webflux-demo.jar requires it. [build-and-push] INFO[0053] ENV LANG C.UTF-8
[build-and-push] INFO[0053] Resolving srcs [target/spring-webflux*.jar]... [build-and-push] error building image: error building stage: failed to get files used from context: copy failed: no source files specified

container step-build-and-push has failed : [{"key":"StartedAt","value":"2020-08-23T16:44:35.577Z","resourceRef":{}}]

I am not sure why this error is coming up. Any input -

vyom-soft avatar Aug 23 '20 16:08 vyom-soft

Same error here:

INFO[0000] Retrieving image manifest openjdk:11-jdk-slim 
INFO[0000] Retrieving image openjdk:11-jdk-slim from registry [index.docker.io](http://index.docker.io/) 
INFO[0006] Built cross stage deps: map[]                
INFO[0006] Retrieving image manifest openjdk:11-jdk-slim 
INFO[0006] Returning cached image manifest              
INFO[0006] Executing 0 build triggers                   
INFO[0006] Unpacking rootfs as cmd ADD target/dynamic-service-*.jar /app/app.jar requires it.

using gcr.io/kaniko-project/executor@sha256:19b934353e409c72b7e71ad9018ed7ba4505682b81da87fb99c7b9dffdb4372a

Any ideas?

Update: after long time(about 20mins) it continues, but I don't know the reason.

drriguz avatar Apr 20 '22 03:04 drriguz

same here:

gcr.io/kaniko-project/executor                                                v1.8.1               a2a981eb8745   2 weeks ago      63.4MB

stuck on Unpacking. My code repo is tiny. only some golang code.

INFO[0008] Unpacking rootfs as cmd COPY go.mod go.mod requires it.
DEBU[0008] Ignore list: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/var/run false} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/perf_event false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/net_cls,net_prio false} {/sys/fs/cgroup/blkio false} {/sys/fs/cgroup/cpu,cpuacct false} {/sys/fs/cgroup/freezer false} {/dev/mqueue false} {/workspace false} {/busybox false} {/kaniko/.docker false} {/dev/termination-log false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false} {/dev/shm false} {/var/run/secrets/kubernetes.io/serviceaccount false} {/proc/bus false} {/proc/fs false} {/proc/irq false} {/proc/sys false} {/proc/sysrq-trigger false} {/proc/acpi false} {/proc/kcore false} {/proc/keys false} {/proc/timer_list false} {/proc/timer_stats false} {/proc/sched_debug false} {/proc/scsi false} {/sys/firmware false}]
DEBU[0015] Not adding /dev because it is ignored
DEBU[0016] Not adding /etc/hostname because it is ignored
DEBU[0016] Not adding /etc/resolv.conf because it is ignored
DEBU[0030] Not adding /proc because it is ignored
DEBU[0036] Not adding /sys because it is ignored
DEBU[0193] Not adding /var/run because it is ignored
DEBU[0194] Whiting out /etc/ca-certificates/.wh..wh..opq
DEBU[0194] not including whiteout files
DEBU[0194] Whiting out /etc/ssl/.wh..wh..opq
DEBU[0194] not including whiteout files
DEBU[0198] Whiting out /usr/lib/sasl2/.wh..wh..opq
DEBU[0198] not including whiteout files
DEBU[0198] Whiting out /usr/lib/ssl/.wh..wh..opq
DEBU[0198] not including whiteout files
............ # lots of similar logs...........................
DEBU[0228] not including whiteout files
DEBU[0232] Whiting out /usr/share/doc/dirmngr/.wh..wh..opq
DEBU[0232] not including whiteout files
DEBU[0233] Whiting out /usr/share/doc/gnupg/.wh..wh..opq
DEBU[0233] not including whiteout files

my pod yaml as below

apiVersion: v1
kind: Pod
metadata:
  name: kaniko
spec:
  containers:
  - name: kaniko
    #image: gcr.io/kaniko-project/executor:debug
    image: gcr.io/kaniko-project/executor:v1.8.1
    args: ["--dockerfile=/workspace/docker/Dockerfile",
            "--context=dir:///workspace",
            "--cache=true",
            "--verbosity=debug",
            "--destination=***/****/****"] # replace with your dockerhub account
    volumeMounts:
      - name: kaniko-secret
        mountPath: /kaniko/.docker
      - name: dockerfile-storage
        mountPath: /workspace
  restartPolicy: Never
  volumes:
    - name: kaniko-secret
      secret:
        secretName: regcred
        items:
          - key: .dockerconfigjson
            path: config.json
    - name: dockerfile-storage
      persistentVolumeClaim:
        claimName: dockerfile-claim

rollback image to v1.8.0, issue remains the same.

But sometimes, it will not stuck forever, but report error error building image: error building stage: failed to getfilesystem from image: unexpected EOF sooner:

......
DEBU[0028] Not adding /sys because it is ignored
error building image: error building stage: failed to get filesystem from image: unexpected EOF

panpan0000 avatar Apr 20 '22 05:04 panpan0000

now, I can reproduce this issue by simply adding two magic lines from tutorial (Step 1): add something in context

+ $ echo "Peter" > a.txt
$ echo 'FROM ubuntu' >> dockerfile
+ $ echo 'COPY a.txt a.txt' >> dockerfile
$ echo 'ENTRYPOINT ["/bin/bash", "-c", "echo hello"]' >> dockerfile

(step 2): add "--cache=true", as kaniko parameter. ( not sure it will be the cause... even without cache, there's still some chance to stuck or fail at the end(error building image: error building stage: failed to get filesystem from image: unexpected EOF)... issue not 100% reproducible. but ~75% . )

And adding ``--snapshotmode=redo/time` does no help..

panpan0000 avatar Apr 20 '22 05:04 panpan0000

@tejal29 would you revisit this issue again ? maybe P1 priority ? seems it makes kaniko unusable ...

panpan0000 avatar Apr 22 '22 07:04 panpan0000

Same here, it takes forever when it says " Unpacking rootfs as cmd COPY as .... needs it". I'm using --cache=true and --cache-copy-layers flags.

yevon avatar Apr 23 '22 21:04 yevon

Similar reports are also in https://github.com/GoogleContainerTools/kaniko/issues/763 if that helps. I'm also seeing the unexpected EOF error but it seems like it mostly happens in GitLab CI, not locally when trying to reproduce it running in a v1.8.1-debug container.

nejch avatar Apr 25 '22 09:04 nejch

In my case was with gitlab also, it seems that if you specify --compressed-caching=false things go much faster in addiction to --cache and --copy-layers

yevon avatar Apr 25 '22 09:04 yevon

I'm seeing the same issue building a python:3.8 based image in GitLab CI. The build fails at random intervals with

INFO[0004] Unpacking rootfs as cmd COPY ./ . requires it. 
error building image: error building stage: failed to get filesystem from image: unexpected EOF

running docker image prune -a on the host seems to resolve the issue temporarily.

mhamiltonj avatar May 27 '22 09:05 mhamiltonj

Because this happens only to some of our projects that do COPY . ., I'm starting to wonder whether some files in the repo (maybe in combination with .dockerignore, not sure) are tripping kaniko up. Are other people seeing this on specific projects only as well?

And just a random question, do people who experience this use xfs for the backing filesystem? I ask because I can't reproduce the issue locally, only happens in gitlab ci.

nejch avatar May 27 '22 09:05 nejch

Because this happens only to some of our projects that do COPY . ., I'm starting to wonder whether some files in the repo (maybe in combination with .dockerignore, not sure) are tripping kaniko up. Are other people seeing this on specific projects only as well?

I only have 2 projects using Kaniko, both have a COPY command in the Dockerfile, however, one is copying a specific file (an SQL dump which is actually created by a previous job), the other is copying the entier repo (with exclusions in .dockerignore)

mhamiltonj avatar May 27 '22 10:05 mhamiltonj

getting same issue with confluentinc/cp-kafka-connect-base:7.2.0 on 3rd run

tooptoop4 avatar Jul 08 '22 00:07 tooptoop4

I got this in debug:

INFO[0075] Unpacking rootfs as cmd COPY --from=galaxy /usr/share/ansible /usr/share/ansible requires it. 
DEBU[0075] Ignore list: [{/kaniko false} {/etc/mtab false} {/tmp/apt-key-gpghome true} {/var/run false} {/proc false} {/dev false} {/dev/pts false} {/sys false} {/sys/fs/cgroup false} {/sys/fs/cgroup/systemd false} {/sys/fs/cgroup/cpu,cpuacct false} {/sys/fs/cgroup/memory false} {/sys/fs/cgroup/net_cls,net_prio false} {/sys/fs/cgroup/cpuset false} {/sys/fs/cgroup/perf_event false} {/sys/fs/cgroup/pids false} {/sys/fs/cgroup/hugetlb false} {/sys/fs/cgroup/freezer false} {/sys/fs/cgroup/devices false} {/sys/fs/cgroup/blkio false} {/dev/mqueue false} {/dev/shm false} {/cache false} {/builds false} {/busybox false} {/certs/client false} {/etc/resolv.conf false} {/etc/hostname false} {/etc/hosts false}] 
DEBU[0075] Not adding /proc because it is ignored       
DEBU[0075] Not adding /etc/hosts because it is ignored  
DEBU[0075] Not adding /etc/mtab because it is ignored   
DEBU[0075] Not adding /etc/resolv.conf because it is ignored 
DEBU[0075] Not adding /etc/hostname because it is ignored 
DEBU[0075] Not adding /dev because it is ignored        
DEBU[0075] Not adding /sys because it is ignored        
DEBU[0077] Not adding /var/run because it is ignored    
error building image: error building stage: failed to get filesystem from image: unexpected EOF

after second 77 it took kaniko another 5 minutes or so to throw that error.

And just a random question, do people who experience this use xfs for the backing filesystem? I ask because I can't reproduce the issue locally, only happens in gitlab ci.

@nejch Our runner also has XFS

 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false

apollo13 avatar Aug 22 '22 15:08 apollo13

this error still happens in some of our pipelines. there is no COPY command on the dockerfile. sometimes it works, sometimes if fails.

EvertonSA avatar Oct 20 '22 13:10 EvertonSA

Facing the same problem...

image

toby1991 avatar Nov 15 '22 02:11 toby1991

The same problem with v1.9.1-debug and v1.9.0-debug on gitlab pipeline.

  • no cache and other flag
  • using ext4
  • include a ADD and a COPY in order, stuck in ADD
  • randomly
  • override entrypoints
  • randomly
  • when stuck, it will not failed, just spend a long time to finish
  • gitlab runner with kubernetes executor

kawhicurry avatar Feb 02 '23 16:02 kawhicurry

build fail with image v1.9.1-debug

Counting objects: 100% (23/23), done.
Compressing objects: 100% (20/20), done.
Total 23 (delta 5), reused 21 (delta 3), pack-reused 0
INFO[0002] Retrieving image manifest golang:latest
INFO[0002] Retrieving image golang:latest from registry [index.docker.io](http://index.docker.io/)
INFO[0004] Built cross stage deps: map[]
INFO[0004] Retrieving image manifest golang:latest
INFO[0004] Returning cached image manifest
INFO[0004] Executing 0 build triggers
WARN[0004] maintainer is deprecated, skipping
INFO[0004] Building stage ‘golang:latest’ [idx: ‘0’, base-idx: ‘-1’]
INFO[0004] Unpacking rootfs as cmd RUN mkdir -p /www/webapp requires it.
error building image: error building stage: failed to get filesystem from image: unexpected EOF

mouuii avatar Mar 07 '23 03:03 mouuii

Same here:

  • kaniko image version: gcr.io/kaniko-project/executor:v1.12.0-debug

  • command: /kaniko/executor --context dir://$PWD --dockerfile $PWD/Dockerfile --cache=true --cache-dir=/cache --destination=${image_name} --skip-unused-stages --use-new-run --snapshot-mode=redo --verbosity=info

  • error log:

[36mINFO[0m[0007] Unpacking rootfs as cmd COPY nginx.conf  /etc/nginx/http.d requires it. 
error building image: error building stage: failed to get filesystem from image: http2: server sent GOAWAY and closed the connection; LastStreamID=7, ErrCode=NO_ERROR, debug=""

go-xmyang avatar Jul 05 '23 07:07 go-xmyang

Maybe somebody can verify - does it also happen if you change the user?

#./Dockerfile
User someotheruser # change from root to other user 

michaelfeil avatar Jul 12 '23 18:07 michaelfeil

Same here:

  • kaniko image version: gcr.io/kaniko-project/executor:v1.12.0-debug
  • env: GitLab CI
  • Base image: amazonlinux:2022
  • command:
/kaniko/executor --cleanup --build-arg FROM_TAG=2.10.5 --build-arg TAG=2.10.5 --custom-platform=linux/amd64 --context=/builds/common/[MASKED]/dockerfiles --dockerfile=external/devonlab/jjobs/Dockerfile --destination=${DESTINATION}
  • Dockerfile:
COPY assets/certs/ /etc/pki/ca-trust/source/anchors/
  • Time: Approximately 30 minutes

Sinhyeok avatar Aug 04 '23 04:08 Sinhyeok

i faced the same error

GitLab CE 13.2.1

kaniko version: kaniko/executor:v1.13.0-debug

command: /kaniko/executor --context $CI_PROJECT_DIR --insecure --insecure-pull --skip-tls-verify --compressed-caching=false --build-arg IMAGE=base_py/3.11.1/poppler --build-arg IMAGE_TAG=1.0.519 --build-arg DOCKER_REGISTRY=${DOCKER_REGISTRY}/ --build-arg PIP_REGISTRY=https://${PYPI_USERNAME}:${PYPI_PASSWORD}@${NEXUS_PYPI_REPO} --build-arg GIT_HOST=${GIT_HOST} --build-arg GIT_USER=${GIT_USER} --build-arg GIT_PASS=${GIT_PASS} --build-arg GIT_BRANCH=${FIRST_BRANCH} --build-arg GIT_FEATURE_BRANCH= --build-arg USER_UID=${USER_UID} --build-arg SYSTEM_PREFIX=${SYSTEM_PREFIX} --build-arg SYSTEM_NAME=${SYSTEM_NAME} --dockerfile $CI_PROJECT_DIR/Dockerfile --destination ${DOCKER_INTERNAL_REGISTRY}/${DOCKER_IMAGE}

pipeline log:

INFO[0000] Unpacking rootfs as cmd COPY ./Pipfile* /usr/app/ requires it. 
error building image: error building stage: failed to get filesystem from image: unexpected EOF
Cleaning up project directory and file based variables
00:04
ERROR: Job failed: exit code 1

in my opinion, the error occurs when two runners installed on the same host simultaneously start building images, but still the error occurs randomly

Enderric avatar Aug 18 '23 10:08 Enderric

same error 2024

hieast avatar Jan 30 '24 02:01 hieast

Seems that race condition occuring, when I'm trying to observe with strace build successfully finishing, but when build stuck, connections to pid with strace shows

user@node-with-kubernetes-executor:~$ sudo strace -p 2871077
strace: Process 2871077 attached
futex(0x27a1148, FUTEX_WAIT_PRIVATE, 0, NULL

paraddise avatar Feb 06 '24 15:02 paraddise

Building with -race parameter and cgo enabled does the trick, but you know golang race detector overhead

paraddise avatar Feb 07 '24 19:02 paraddise