buildah
buildah copied to clipboard
Use of CDI does not consume labeled devices during build
Issue Description
When using NVIDIA GPUs with Podman via the Container Device Interface podman build
fails to use labeled devices while podman run
works as intended.
However, if using the direct device path the podman build
execution works as expected.
Steps to reproduce the issue
Steps to reproduce the issue
- Install NVIDIA Drivers
- Install Podman
- Install NVIDIA Container Toolkit:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
- Configure NVIDIA CTK for use with CDI:
nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
- Test CDI integration for
podman run
which works:podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
- Start a
podman build
with the same device label which fails:
# Get a test containerfile
curl -O https://raw.githubusercontent.com/kenmoini/smart-drone-patterns/main/apps/darknet/Containerfile.ubnt22
# Build a container with the device label which fails
podman build --device nvidia.com/gpu=all --security-opt=label=disable -t darknet -f Containerfile.ubnt22 .
# - Output
Error: creating build executor: getting info of source device nvidia.com/gpu=all: stat nvidia.com/gpu=all: no such file or directory
# Build a container with the direct device path which works
podman build --device /dev/nvidia0 -t darknet -f Containerfile.ubnt22 --security-opt=label=disable .
Describe the results you received
The result of using the CDI device label fails:
podman build --device nvidia.com/gpu=all --security-opt=label=disable -t darknet -f Containerfile.ubnt22 .
Error: creating build executor: getting info of source device nvidia.com/gpu=all: stat nvidia.com/gpu=all: no such file or directory
Describe the results you expected
The container build to start with the device label - only works if you use the device path, but that doesn't seem to load all the associated paths that are defined in the generated CDI configuration.
podman info output
host:
arch: arm64
buildahVersion: 1.31.3
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.8-1.el9.aarch64
path: /usr/bin/conmon
version: 'conmon version 2.1.8, commit: f0f506932ce1dc9fc7f1adb457a73d0a00207272'
cpuUtilization:
idlePercent: 99.98
systemPercent: 0.01
userPercent: 0.01
cpus: 32
databaseBackend: boltdb
distribution:
distribution: '"rhel"'
version: "9.3"
eventLogger: journald
freeLocks: 2048
hostname: avalon.kemo.labs
idMappings:
gidmap: null
uidmap: null
kernel: 5.14.0-362.18.1.el9_3.aarch64
linkmode: dynamic
logDriver: journald
memFree: 121339949056
memTotal: 133915746304
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.7.0-1.el9.aarch64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.7.0
package: netavark-1.7.0-2.el9_3.aarch64
path: /usr/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: crun-1.8.7-1.el9.aarch64
path: /usr/bin/crun
version: |-
crun version 1.8.7
commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
rundir: /run/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
pasta:
executable: /bin/pasta
package: passt-0^20230818.g0af928e-4.el9.aarch64
version: |
pasta 0^20230818.g0af928e-4.el9.aarch64
Copyright Red Hat
GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /bin/slirp4netns
package: slirp4netns-1.2.1-1.el9.aarch64
version: |-
slirp4netns version 1.2.1
commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 4294963200
swapTotal: 4294963200
uptime: 105h 12m 27.00s (Approximately 4.38 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 1993421922304
graphRootUsed: 28735803392
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 4
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.6.1
Built: 1705652546
BuiltTime: Fri Jan 19 03:22:26 2024
GitCommit: ""
GoVersion: go1.20.12
Os: linux
OsArch: linux/arm64
Version: 4.6.1
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Running on RHEL 9.3 on an Ampere Altra system - same error on an X86 system.
Additional information
Looks like this also affects buildah: https://github.com/containers/buildah/issues/5432 https://github.com/containers/buildah/pull/5443