compose
compose copied to clipboard
Can't access GPU during build with docker compose v2
Description
Accessing the GPU during build using Docker Compose v2 doesn't work.
It does work when the container is running, but some of my build steps need the GPU for compilation with cuda.
It doesn't seem to work using either runtime/resources flags as described here
This does work using docker compose v1.
Steps to reproduce the issue:
- docker-compose v2 doesn't build
The attached yml + Dockerfile fail with an AssertionError.
docker compose build nvidia-test
docker compose build nvidia-test-2
- docker-compose v1 works
Running with docker-compose v1 installed via pip, the attached yml and Dockerfiles run successfully.
docker-compose build nvidia-test
docker-compose build nvidia-test-2
Output of docker compose version
:
v2
docker compose version
Docker Compose version v2.6.0
v1
docker-compose version
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.3
CPython version: 3.9.4
OpenSSL version: OpenSSL 1.1.1k 25 Mar 2021
Output of docker info
:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
compose: Docker Compose (Docker Inc., v2.6.0)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 34
Running: 2
Paused: 0
Stopped: 32
Images: 31
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
runc version: v1.1.2-0-ga916309
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.15.0-1015-aws
Operating System: Ubuntu 20.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.34GiB
Name: ip-172-31-33-172
ID: 7QW3:4AFO:BJBD:IH6R:IXVA:WWW2:Z5EL:HRH4:E4Y4:MFZD:KUWE:VH75
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Dockerfile
FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime
RUN python -c "import torch;assert torch.cuda.is_available()"
docker-compose.yml
version: "3.9"
services:
nvidia-test:
build: ./
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
nvidia-test-2:
build: ./
runtime: nvidia
We experience the same issue. This is currently holding us back from making the transition to compose v2 and the cli plugin.
Can you try running without buildkit and see if the result is any different?
DOCKER_BUILDKIT=0 docker compose build nvidia-test
No, disabling buildkit gives the same error. Specifically it gives:
ERROR: CUDA initialization failure with error 35.
Setting the "default-runtime" in /etc/docker/daemon.json and using compose v1 the same machine can init cuda without problems during build steps.
P.S.: The initial author of the issue has "nvidia" as the default runtime as well. I don't understand how this doesn't apply to compose v2 if it applies to compose v1.
Btw, we would be very happy to get rid of the default runtime setting. The only issue is that this has been the only reliable solution in the past years to get GPU support into the containers, as this issue proves again today.
To clarify what we tried:
compose v2.6 + runc default runtime + deploy>resources>devices>gpu in YML + DOCKER_BUILDKIT=0 docker compose build -> cuda init error
compose v1 + nvidia default runtime + docker-compose build -> success
I am experiencing the same problem
Related issues:
https://github.com/moby/buildkit/issues/1436 (adding GPUs to run commands), and https://github.com/moby/buildkit/issues/2485 (adding alternative runtimes to buildkit)
Tbh I feel like putting this in the dockerfile is the right way to fix this.
deploy>resources>devices>gpu
(as the naming implies) defined the resources allocated to run container, not to build.
Can you please try running build with DOCKER_BUILDKIT=0 docker compose build
? This will use the "classic" builder, which doesn't involves buildkit
I'm also having this problem and disabling BuildKit by DOCKER_BUILDKIT=0
solves this strange problem for me. Isn't there any other way to fix this?
DOCKER_BUILDKIT=0
solves this issue for me as well though it would be nice to have a reference in the documentation for it.
I'm probably being a noob here but is there a way to set DOCKER_BUILDKIT=0
in the docker-compose.yml
file for that specific service, instead of adding it to the docker compose up
command?
This isn't really a solution. I want to use buildkit, it provides cache volumes which speed up builds a lot.
Right now I'm building with docker-compose
and running the containers with docker compose
, works for now.
@danielgafni Are you saying that by using docker-compose
this problem can be averted and we can ALSO use buildkit ?
Exactly (this is literally the original issue lol).
@danielgafni buildkit doesn't support GPU devices (yet) see https://github.com/moby/buildkit/issues/1436 and https://github.com/moby/buildkit/issues/2485
I'm closing this issue as same issue applies to plain docker build
once buildkit has been set as default builder (which is the case in Docker Desktop). Docker Compose will obviously add support for GPU when building image once this feature is available on buildkit