kaniko
kaniko copied to clipboard
Symlink issue on k8s GPU node with /lib/firmware/nvidia/525.xx
Actual behavior
After updating and upgrading to the latest nvidia drivers, kaniko runs into issues:
error building image: error building stage: failed to get filesystem from image: error removing lib to make way for new symlink: unlinkat //lib/firmware/nvidia/525.147.05/gsp_ad10x.bin: device or resource busy
- using --ignore-path=/lib/firmware/nvidia/525.147.05 is not considered, using /lib as ignore path breaks obvious other things during the build process
Expected behavior A week ago, before the nvidia update, the container build run without issues.
To Reproduce Steps to reproduce the behavior:
- using k8s node with nvidia gpu driver installed
- using woodpecker-ci with kaniko plugin
Additional Information
- Dockerfile
ARG LAB_IMAGE=quay.io/jupyter/scipy-notebook:lab-4.0.12
FROM ${LAB_IMAGE}
RUN pip install ipywebrtc==0.6.0
-
Build Context No addition add/copy commands are used By using ignote-path pointing to the exact firmware folder (/lib/firmware/nvidia/525.147.05), kaniko still shows DEBU[0001] Ignore list: .... {/lib/firmware/nvidia/525.147.05/gsp_ad10x.bin false}
-
Kaniko Image (fully qualified with digest) gcr.io/kaniko-project/executor:v1.19.2-debug
https://github.com/woodpecker-ci/plugin-kaniko/blob/main/Dockerfile
Triage Notes for the Maintainers
Description | Yes/No |
---|---|
Please check if this a new feature you are proposing |
|
Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
Please check if your dockerfile is a multistage dockerfile |
|
Same problem, newer version: //lib/firmware/nvidia/535.54.03/gsp_ga10x.bin
Coder envbuilder has a workaround for ignore-paths https://github.com/coder/envbuilder/blob/8d3cfdffc3ab221a5d224418259128c46dd51a86/envbuilder.go#L544
But, after ignoring nvidia, I have to ignore /var/run and then this happens:
Failed to build: error building stage: failed to execute command: starting command: fork/exec /bin/sh: no such file or directory
Same problem.
It works with docker build, but it does not work with kaniko.
Did you maybe @maltegrosse or @marrotte found a workaround?
I was thinking to get back to DIND for nvidia/cuda image builds, but I am not mean on doing it if there is a better way.... Something changed in nvidia/cuda image retroacively and since then it is failing....
I couldnt get it running anymore, thats why I switched to buildah
Yeah, I added additional CI templates based on DIND to build images with this problem (in my case nvidia/cuda) hoping it will get fixed soon.