docker-agent
docker-agent copied to clipboard
Running jenkins/agent:jdk11-windowsservercore-ltsc2019 on K8s containerd node generates errors
Jenkins and plugins versions report
A pipeline job in Jenkins executes successfully on K8s windows docker node, but fails with errors on containerd node.
- Successful when:
...
nodeSelector:
cloud.google.com/gke-container-runtime: docker
kubernetes.io/os: windows
- Fails when:
...
nodeSelector:
cloud.google.com/gke-container-runtime: containerd
kubernetes.io/os: windows
- Sample snippet from log:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 92s default-scheduler Successfully assigned jenkins-<redact>/jenkins-agent-d4b696fcf-g9gb9 to gke-16cb49-yd8r
Warning Failed 86s kubelet Error: failed to generate container "a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4" spec: failed to generate spec: failed to stat "C:\\ProgramData\\containerd\\root\\io.containerd.grpc.v1.cri\\containers\\a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4\\volumes\\4c60be2b1fc945d94cb8635b16452c4b07c846812a93cd15677471ad1147f93e": CreateFile C:\ProgramData\containerd\root\io.containerd.grpc.v1.cri\containers\a16091b52dcb5e1317014aaa176758c88cc5313135917f5f57cb188a91e3bba4\volumes\4c60be2b1fc945d94cb8635b16452c4b07c846812a93cd15677471ad1147f93e: The system cannot find the path specified.
...
What Operating System are you using (both controller, and any agents involved in the problem)?
- Jenkins
RHEL8
Version 2.346.2
Plugin: Kubernetes 3670.v6ca_059233222
- K8s
# kubectl get nodes -A -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-16cb49-7uy4 Ready <none> 13d v1.22.10-gke.600 ************* <none> Windows Server 2019 Datacenter 10.0.17763.2803 containerd://1.5.10-gke.2
gke-16cb49-yd8r Ready <none> 13d v1.22.10-gke.600 ************* <none> Windows Server 2019 Datacenter 10.0.17763.2803 containerd://1.5.10-gke.2
gke-cluster-3-default-pool-f4e8d14d-em12 Ready <none> 13d v1.22.10-gke.600 ************* <none> Container-Optimized OS from Google 5.10.109+ containerd://1.5.11
gke-cluster-3-default-pool-f4e8d14d-qrrr Ready <none> 13d v1.22.10-gke.600 ************* <none> Container-Optimized OS from Google 5.10.109+ containerd://1.5.11
gke-f6821c-2svd Ready <none> 2d11h v1.22.10-gke.600 ************* <none> Windows Server 2019 Datacenter 10.0.17763.2803 docker://20.10.9
gke-f6821c-63xj Ready <none> 2d11h v1.22.10-gke.600 ************* <none> Windows Server 2019 Datacenter 10.0.17763.2803 docker://20.10.9
Reproduction steps
- Jenkins pipeline script
pipeline {
agent {
kubernetes {
yaml '''
apiVersion: v1
kind: Pod
spec:
automountServiceAccountToken: false
containers:
- name: jnlp
image: jenkins/inbound-agent:windowsservercore-ltsc2019
imagePullPolicy: IfNotPresent
#force pull, but takes time
#imagePullPolicy: Always
- name: clean-windows-server
image: mcr.microsoft.com/windows/servercore:ltsc2019
imagePullPolicy: IfNotPresent
#force pull, but takes time
#imagePullPolicy: Always
command:
- powershell
args:
- Start-Sleep
- 999999
nodeSelector:
cloud.google.com/gke-container-runtime: docker
#cloud.google.com/gke-container-runtime: containerd
kubernetes.io/os: windows
'''
}
}
stages {
stage('Echo environments for jnlp and clean-windows-server') {
steps {
echo '----- jnlp container -----'
container('jnlp') {
bat 'set | sort'
bat 'ping 8.8.8.8'
}
echo '----- clean-windows-server container -----'
container('clean-windows-server') {
bat 'set | sort'
bat 'ping 8.8.8.8'
//sleep to allow time for inspecting nodes via kubectl
bat 'powershell sleep 600'
}
}
}
}
}
- Standalone deployment
jenkins-agent.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins-agent
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: jenkins-agent
#image: mcr.microsoft.com/windows/servercore:ltsc2019
#image: eclipse-temurin:11.0.16_8-jdk-windowsservercore-1809
#image: jenkins/agent:jdk11-windowsservercore-ltsc2019
image: jenkins/inbound-agent:windowsservercore-ltsc2019
#image: <redact>/devops-test:agenthack005
command: ["powershell", "sleep", "999999"]
nodeSelector:
kubernetes.io/os: windows
#cloud.google.com/gke-container-runtime: docker
cloud.google.com/gke-container-runtime: containerd
- Debugging info
The ancestry of the jenkins/inbound-agent is observed to be:
mcr.microsoft.com/windows/servercore:ltsc2019
eclipse-temurin:11.0.16_8-jdk-windowsservercore-1809
jenkins/agent:jdk11-windowsservercore-ltsc2019
jenkins/inbound-agent:windowsservercore-ltsc2019
The ancestry images for the jenkins/inbound-agent were directly deployed onto K8s for debugging with a sleep command. All images deployed successfully while using K8s docker node. On K8s containerd node, the eclipse-temurin image sucessfully deployed, and the jenkins/agent image failed.
A custom build of the jenkins/agent image was iteratively generated with various commands commented out. An iteration with only the VOLUME lines (~line 82 as of commit 1dd17e7) commented allowed a successful deployment on a containerd node.
Expected Results
Successful deploy on K8s windows docker and containerd nodes
Actual Results
Successful deploy on K8s windows docker node, failed deploy on K8s windows containerd node
Anything else?
No response
What operating system version is your nodepool?
pool-1 Image type: Windows Long Term Servicing Channel with containerd (windows_ltsc_containerd)
which version?
Kernel version: 10.0.17763.2803 Container runtime version: containerd://1.5.10-gke.2
not sure, no real experience with Windows containers other than the host version needing to patch the container version.
cc @slide in case you know more here
I don't know anything about k8s, so I don't know what the error means.
Based on feedback from Google support, the GKE was updated to 1.23.9-gke.2100. The image now deploys successfully.