kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

intermittent wrong architecture selected when building cross platform

Open cvaliente opened this issue 1 year ago • 3 comments

Actual behavior kaniko occasionally fails building arm64 images, but succeeds after several retries. sometimes the build fails in this stage:

INFO[0016] RUN mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/ 
INFO[0016] Initializing snapshotter ...                 
INFO[0016] Taking snapshot of full filesystem...        
INFO[0025] Cmd: /bin/sh                                 
INFO[0025] Args: [-c mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/] 
INFO[0025] Running: [/bin/sh -c mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/] 
error building image: error building stage: failed to execute command: starting command: fork/exec /bin/sh: exec format error

if we retry the job, it usually succeeds. Only one out of 5 times actually fails, in an otherwise identical environment.

Expected behavior A clear and concise description of what you expected to happen.

To Reproduce We are building two versions, one for arm64, one for amd64. the base image is provided for both architectures, we are running a simple RUN mkdir &&cp in the docker file.

build command:

      /kaniko/executor \
          --dockerfile ${DOCKERFILE} \
          --context ${CI_PROJECT_DIR} \
          --custom-platform ${PLATFORM} \
          --build-arg opts='GOARCH=${GOARCH}' \
          --destination ${DOCKER_REGISTRY}:${VERSION}${SUFFIX}

with

    SUFFIX: "-aarch64"
    PLATFORM: "linux/arm64/v8"
    GOARCH: arm64

Additional Information

  • Dockerfile
FROM flink:1.17.2-scala_2.12-java8
RUN mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/
  • Build Context running on a gitlab cicd runner
  • Kaniko Image (fully qualified with digest) gcr.io/kaniko-project/executor:v1.21.1-debug
  • full logs:
$ /kaniko/executor \ # collapsed multi-line command
INFO[0000] Retrieving image manifest flink:1.17.2-scala_2.12-java8 
INFO[0000] Retrieving image flink:1.17.2-scala_2.12-java8 from registry index.docker.io 
INFO[0002] Built cross stage deps: map[]                
INFO[0002] Retrieving image manifest flink:1.17.2-scala_2.12-java8 
INFO[0002] Returning cached image manifest              
INFO[0002] Executing 0 build triggers                   
INFO[0002] Building stage 'flink:1.17.2-scala_2.12-java8' [idx: '0', base-idx: '-1'] 
INFO[0002] Unpacking rootfs as cmd RUN mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/ requires it. 
INFO[0016] RUN mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/ 
INFO[0016] Initializing snapshotter ...                 
INFO[0016] Taking snapshot of full filesystem...        
INFO[0025] Cmd: /bin/sh                                 
INFO[0025] Args: [-c mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/] 
INFO[0025] Running: [/bin/sh -c mkdir ./plugins/s3-fs-hadoop && cp ./opt/flink-s3-fs-hadoop-*.jar ./plugins/s3-fs-hadoop/] 
error building image: error building stage: failed to execute command: starting command: fork/exec /bin/sh: exec format error

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [ ]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - [ ]

cvaliente avatar Mar 19 '24 06:03 cvaliente

Any update on this problem? I'm having the same issues.

ricardorqr avatar May 14 '24 22:05 ricardorqr

@ricardorqr what is the base image you use and is it also building arm64

and @cvaliente do you already try custom platform flag

https://github.com/GoogleContainerTools/kaniko?tab=readme-ov-file#flag---custom-platform

prima101112 avatar May 27 '24 19:05 prima101112

Hi @prima101112

I have a Jenkins pipeline which creates a multi-architecture image using Kaniko. I'm following this tutorial: https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-multi-architecture-image.html

This is my pipeline

pipeline {
  agent {
    kubernetes {
      inheritFrom 'jenkins-kaniko-tomcat'
      yaml """
        apiVersion: v1
        kind: Pod
        metadata:
          name: kaniko-tomcat
          namespace: jenkins
        spec:
          containers:
            - name: awscli
              image: amazon/aws-cli:latest
              command:
              - sleep
              args:
              - 99d
              volumeMounts:
                - name: docker-config
                  mountPath: /kaniko/.docker
                - name: jenkins-root
                  mountPath: /tmp/workspace
            
            - name: kaniko
              image: gcr.io/kaniko-project/executor:debug
              #image: 495633144232.dkr.ecr.us-west-2.amazonaws.com/jenkins-agent-rico:kaniko
              imagePullPolicy: Always
              command:
              - sleep
              args:
              - 99d
              volumeMounts:
                - name: docker-config
                  mountPath: /kaniko/.docker
                - name: jenkins-root
                  mountPath: /tmp/workspace       
          restartPolicy: Never
          volumes:
            - name: docker-config
              configMap:
                name: docker-config
            - name: jenkins-root
              emptyDir: {}
      """
    }
  }
  
  stages {
    // Other stages...

    stage('Build image - arm64') {
      steps {
        container(name: 'kaniko') {
          sh '''
            /kaniko/executor --context `pwd` --verbosity trace \
            --destination 123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image-arm64 \
            --custom-platform linux/arm64
          '''
        }
      }
    }
    
    stage('Build image - amd64') {
      steps {
        container(name: 'kaniko') {
          sh '''
            cd uber-cloud-tools
            modified_string=$(echo "$VERSION" | tr '/' '-')
          
            /kaniko/executor --context `pwd` --verbosity debug \
            --destination 123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image-amd64 \
            --custom-platform linux/amd64
          '''
        }
      }
    }

    stage('Build image - multi-architecture') {
      steps {
        container(name: 'awscli') {
          sh '''
            aws ecr get-login-password --region us-east-2 | docker login --username AWS \
            --password-stdin 123456789.dkr.ecr.us-east-2.amazonaws.com

            docker manifest create 123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image \
            123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image-arm64 \
            123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image-amd64

            docker manifest push 123456789.dkr.ecr.us-east-2.amazonaws.com/my-repo:$my-image
          '''
        }
      }
    }
    
  }
}

This is my Dockerfile. As you can see it is a very simple one.

FROM amazoncorretto:8u342

RUN yum install -y procps && yum clean all

VOLUME ["/data"]

And this is the error.

 ...
 [36mINFO [0m[0032] Cmd: /bin/sh                                 
 [36mINFO [0m[0032] Args: [-c yum install -y procps && yum clean all] 
 [36mINFO [0m[0032] Running: [/bin/sh -c yum install -y procps && yum clean all] 
error building image: error building stage: failed to execute command: starting command: fork/exec /bin/sh: exec format error

When I execute the line RUN yum install -y procps && yum clean all, it fails. The other pipelines don't have any command to perform, so they are working. If I remove the stage Build image - arm64, the stage Build image - amd64 works. I'm also using --custom-platform. It seems this error is related to arm64. I'm not sure.

Would you happen to have any idea what this can be?

ricardorqr avatar May 29 '24 20:05 ricardorqr