compose icon indicating copy to clipboard operation
compose copied to clipboard

Can't access GPU during build with docker compose v2

Open dgmp88 opened this issue 2 years ago • 5 comments

Description

Accessing the GPU during build using Docker Compose v2 doesn't work.

It does work when the container is running, but some of my build steps need the GPU for compilation with cuda.

It doesn't seem to work using either runtime/resources flags as described here

This does work using docker compose v1.

Steps to reproduce the issue:

  • docker-compose v2 doesn't build

The attached yml + Dockerfile fail with an AssertionError.

docker compose build nvidia-test docker compose build nvidia-test-2

  • docker-compose v1 works

Running with docker-compose v1 installed via pip, the attached yml and Dockerfiles run successfully. docker-compose build nvidia-test docker-compose build nvidia-test-2

Output of docker compose version:

v2

docker compose version
Docker Compose version v2.6.0

v1

docker-compose version
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.3
CPython version: 3.9.4
OpenSSL version: OpenSSL 1.1.1k  25 Mar 2021

Output of docker info:

Client:                                                  
 Context:    default                                                                                              
 Debug Mode: false                                                                                                
 Plugins:                                                                                                         
  app: Docker App (Docker Inc., v0.9.1-beta3)                                                                     
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)                                                              
  compose: Docker Compose (Docker Inc., v2.6.0)                                                                   
  scan: Docker Scan (Docker Inc., v0.17.0)                                                                        
                                                                                                                  
Server:                                                                                                           
 Containers: 34                                          
  Running: 2                                             
  Paused: 0            
  Stopped: 32
 Images: 31                                                                                                       
 Server Version: 20.10.17                                
 Storage Driver: overlay2                                                                                         
  Backing Filesystem: extfs                              
  Supports d_type: true 
  Native Overlay Diff: true
  userxattr: false                                       
 Logging Driver: json-file                                                                                        
 Cgroup Driver: cgroupfs                                                                                          
 Cgroup Version: 1                                                                                                
 Plugins:         
  Volume: local                                          
  Network: bridge host ipvlan macvlan null overlay       
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog                             
 Swarm: inactive                                         
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia                                       
 Default Runtime: nvidia                                 
 Init Binary: docker-init                                
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.15.0-1015-aws
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.34GiB
 Name: ip-172-31-33-172
 ID: 7QW3:4AFO:BJBD:IH6R:IXVA:WWW2:Z5EL:HRH4:E4Y4:MFZD:KUWE:VH75
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Dockerfile

FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

RUN python -c "import torch;assert torch.cuda.is_available()"

docker-compose.yml

version: "3.9"

services:
  nvidia-test:
    build: ./
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]

  nvidia-test-2:
    build: ./
    runtime: nvidia

dgmp88 avatar Jul 27 '22 09:07 dgmp88

We experience the same issue. This is currently holding us back from making the transition to compose v2 and the cli plugin.

philipp-schmidt avatar Aug 24 '22 13:08 philipp-schmidt

Can you try running without buildkit and see if the result is any different?

DOCKER_BUILDKIT=0 docker compose build nvidia-test

nicksieger avatar Aug 24 '22 15:08 nicksieger

No, disabling buildkit gives the same error. Specifically it gives:

ERROR: CUDA initialization failure with error 35.

Setting the "default-runtime" in /etc/docker/daemon.json and using compose v1 the same machine can init cuda without problems during build steps.

philipp-schmidt avatar Aug 25 '22 09:08 philipp-schmidt

P.S.: The initial author of the issue has "nvidia" as the default runtime as well. I don't understand how this doesn't apply to compose v2 if it applies to compose v1.

Btw, we would be very happy to get rid of the default runtime setting. The only issue is that this has been the only reliable solution in the past years to get GPU support into the containers, as this issue proves again today.

philipp-schmidt avatar Aug 25 '22 09:08 philipp-schmidt

To clarify what we tried:

compose v2.6 + runc default runtime + deploy>resources>devices>gpu in YML + DOCKER_BUILDKIT=0 docker compose build -> cuda init error

compose v1 + nvidia default runtime + docker-compose build -> success

philipp-schmidt avatar Aug 25 '22 09:08 philipp-schmidt

I am experiencing the same problem

danielgafni avatar Sep 24 '22 10:09 danielgafni

Related issues:

https://github.com/moby/buildkit/issues/1436 (adding GPUs to run commands), and https://github.com/moby/buildkit/issues/2485 (adding alternative runtimes to buildkit)

Tbh I feel like putting this in the dockerfile is the right way to fix this.

nicks avatar Oct 28 '22 17:10 nicks

deploy>resources>devices>gpu (as the naming implies) defined the resources allocated to run container, not to build.

Can you please try running build with DOCKER_BUILDKIT=0 docker compose build? This will use the "classic" builder, which doesn't involves buildkit

ndeloof avatar Dec 13 '22 17:12 ndeloof

I'm also having this problem and disabling BuildKit by DOCKER_BUILDKIT=0 solves this strange problem for me. Isn't there any other way to fix this?

Neltherion avatar Mar 05 '23 14:03 Neltherion

DOCKER_BUILDKIT=0 solves this issue for me as well though it would be nice to have a reference in the documentation for it.

charlescoult avatar Apr 01 '23 21:04 charlescoult

I'm probably being a noob here but is there a way to set DOCKER_BUILDKIT=0 in the docker-compose.yml file for that specific service, instead of adding it to the docker compose up command?

charlescoult avatar Apr 01 '23 21:04 charlescoult

This isn't really a solution. I want to use buildkit, it provides cache volumes which speed up builds a lot.

Right now I'm building with docker-compose and running the containers with docker compose, works for now.

danielgafni avatar Apr 02 '23 08:04 danielgafni

@danielgafni Are you saying that by using docker-compose this problem can be averted and we can ALSO use buildkit ?

Neltherion avatar Apr 02 '23 08:04 Neltherion

Exactly (this is literally the original issue lol).

danielgafni avatar Apr 02 '23 08:04 danielgafni

@danielgafni buildkit doesn't support GPU devices (yet) see https://github.com/moby/buildkit/issues/1436 and https://github.com/moby/buildkit/issues/2485

ndeloof avatar Apr 03 '23 08:04 ndeloof

I'm closing this issue as same issue applies to plain docker build once buildkit has been set as default builder (which is the case in Docker Desktop). Docker Compose will obviously add support for GPU when building image once this feature is available on buildkit

ndeloof avatar Apr 03 '23 08:04 ndeloof