docker-agent icon indicating copy to clipboard operation
docker-agent copied to clipboard

Running jenkins/agent:jdk11-windowsservercore-ltsc2019 on K8s containerd node generates errors

Open gadgetwhiz opened this issue 2 years ago • 3 comments

Jenkins and plugins versions report

A pipeline job in Jenkins executes successfully on K8s windows docker node, but fails with errors on containerd node.

  • Successful when:
    ...
    nodeSelector:
      cloud.google.com/gke-container-runtime: docker
      kubernetes.io/os: windows
  • Fails when:
    ...
    nodeSelector:
        cloud.google.com/gke-container-runtime: containerd
        kubernetes.io/os: windows
  • Sample snippet from log:
Type     Reason     Age               From               Message
----     ------     ----              ----               -------
Normal   Scheduled  92s               default-scheduler  Successfully assigned jenkins-<redact>/jenkins-agent-d4b696fcf-g9gb9 to gke-16cb49-yd8r
Warning  Failed     86s               kubelet            Error: failed to generate container "a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4" spec: failed to generate spec: failed to stat "C:\\ProgramData\\containerd\\root\\io.containerd.grpc.v1.cri\\containers\\a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4\\volumes\\4c60be2b1fc945d94cb8635b16452c4b07c846812a93cd15677471ad1147f93e": CreateFile C:\ProgramData\containerd\root\io.containerd.grpc.v1.cri\containers\a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4\volumes\4c60be2b1fc945d94cb8635b16452c4b07c846812a93cd15677471ad1147f93e: The system cannot find the path specified.
...

What Operating System are you using (both controller, and any agents involved in the problem)?

  • Jenkins
RHEL8
Version 2.346.2
Plugin: Kubernetes 3670.v6ca_059233222
  • K8s
# kubectl get nodes -A -o wide
NAME                                       STATUS   ROLES    AGE     VERSION            INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                             KERNEL-VERSION    CONTAINER-RUNTIME
gke-16cb49-7uy4                            Ready    <none>   13d     v1.22.10-gke.600   *************   <none>        Windows Server 2019 Datacenter       10.0.17763.2803   containerd://1.5.10-gke.2
gke-16cb49-yd8r                            Ready    <none>   13d     v1.22.10-gke.600   *************   <none>        Windows Server 2019 Datacenter       10.0.17763.2803   containerd://1.5.10-gke.2
gke-cluster-3-default-pool-f4e8d14d-em12   Ready    <none>   13d     v1.22.10-gke.600   *************   <none>        Container-Optimized OS from Google   5.10.109+         containerd://1.5.11
gke-cluster-3-default-pool-f4e8d14d-qrrr   Ready    <none>   13d     v1.22.10-gke.600   *************   <none>        Container-Optimized OS from Google   5.10.109+         containerd://1.5.11
gke-f6821c-2svd                            Ready    <none>   2d11h   v1.22.10-gke.600   *************   <none>        Windows Server 2019 Datacenter       10.0.17763.2803   docker://20.10.9
gke-f6821c-63xj                            Ready    <none>   2d11h   v1.22.10-gke.600   *************   <none>        Windows Server 2019 Datacenter       10.0.17763.2803   docker://20.10.9

Reproduction steps

  • Jenkins pipeline script
pipeline {
  agent {
    kubernetes {
      yaml '''
        apiVersion: v1
        kind: Pod
        spec:
          automountServiceAccountToken: false
          containers:
          - name: jnlp
            image: jenkins/inbound-agent:windowsservercore-ltsc2019
            imagePullPolicy: IfNotPresent
            #force pull, but takes time
            #imagePullPolicy: Always
          - name: clean-windows-server
            image: mcr.microsoft.com/windows/servercore:ltsc2019
            imagePullPolicy: IfNotPresent
            #force pull, but takes time
            #imagePullPolicy: Always
            command:
            - powershell
            args:
            - Start-Sleep
            - 999999
          nodeSelector:
            cloud.google.com/gke-container-runtime: docker
            #cloud.google.com/gke-container-runtime: containerd
            kubernetes.io/os: windows
        '''	 
    }	
  }
  stages {
    stage('Echo environments for jnlp and clean-windows-server') {
      steps {
        echo '----- jnlp container -----'
        container('jnlp') {
          bat 'set | sort'
          bat 'ping 8.8.8.8'
        }
        echo '----- clean-windows-server container -----'
        container('clean-windows-server') {
          bat 'set | sort'
          bat 'ping 8.8.8.8'
          //sleep to allow time for inspecting nodes via kubectl
          bat 'powershell sleep 600'
        }
      }
    }
  }
}
  • Standalone deployment
jenkins-agent.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jenkins-agent
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
        - name: jenkins-agent
          #image: mcr.microsoft.com/windows/servercore:ltsc2019
          #image: eclipse-temurin:11.0.16_8-jdk-windowsservercore-1809
          #image: jenkins/agent:jdk11-windowsservercore-ltsc2019
          image: jenkins/inbound-agent:windowsservercore-ltsc2019
          #image: <redact>/devops-test:agenthack005
          command: ["powershell", "sleep", "999999"]
      nodeSelector:
        kubernetes.io/os: windows
        #cloud.google.com/gke-container-runtime: docker
        cloud.google.com/gke-container-runtime: containerd
  • Debugging info
The ancestry of the jenkins/inbound-agent is observed to be:
    mcr.microsoft.com/windows/servercore:ltsc2019   
    eclipse-temurin:11.0.16_8-jdk-windowsservercore-1809
    jenkins/agent:jdk11-windowsservercore-ltsc2019
    jenkins/inbound-agent:windowsservercore-ltsc2019

The ancestry images for the jenkins/inbound-agent were directly deployed onto K8s for debugging with a sleep command. All images deployed successfully while using K8s docker node. On K8s containerd node, the eclipse-temurin image sucessfully deployed, and the jenkins/agent image failed.

A custom build of the jenkins/agent image was iteratively generated with various commands commented out. An iteration with only the VOLUME lines (~line 82 as of commit 1dd17e7) commented allowed a successful deployment on a containerd node.

Expected Results

Successful deploy on K8s windows docker and containerd nodes

Actual Results

Successful deploy on K8s windows docker node, failed deploy on K8s windows containerd node

Anything else?

No response

gadgetwhiz avatar Aug 11 '22 22:08 gadgetwhiz

What operating system version is your nodepool?

timja avatar Aug 12 '22 08:08 timja

pool-1 Image type: Windows Long Term Servicing Channel with containerd (windows_ltsc_containerd)

gadgetwhiz avatar Aug 15 '22 13:08 gadgetwhiz

which version?

timja avatar Aug 15 '22 15:08 timja

Kernel version: 10.0.17763.2803 Container runtime version: containerd://1.5.10-gke.2

gadgetwhiz avatar Aug 17 '22 20:08 gadgetwhiz

not sure, no real experience with Windows containers other than the host version needing to patch the container version.

cc @slide in case you know more here

timja avatar Aug 17 '22 20:08 timja

I don't know anything about k8s, so I don't know what the error means.

slide avatar Aug 17 '22 20:08 slide

Based on feedback from Google support, the GKE was updated to 1.23.9-gke.2100. The image now deploys successfully.

gadgetwhiz avatar Sep 09 '22 00:09 gadgetwhiz