docker-py icon indicating copy to clipboard operation
docker-py copied to clipboard

create_container not support --gpus param

Open ffteen opened this issue 6 years ago • 32 comments

docker version: 19.03 I want to set --gpus all when create container ,but found docker-py not support this param.

ffteen avatar Aug 01 '19 12:08 ffteen

Hello @ffteen thank you for the report

jcsirot avatar Aug 05 '19 12:08 jcsirot

any progress on this issue?

msadri70 avatar Aug 17 '19 21:08 msadri70

I think one hacky way, though not very reliable, is to use the low-level Api and overwrite the host configuration. Since I only tried to follow the docker cli code in go, I'm not sure how reliable/portable this solution is. It works on my machine and I thought it might help someone until the official support is implemented.

The following code is a modification of the original DockerClient.containers.create() function, that adds a DeviceRequest to the host configuration and otherwise works exactly like the original function:

import docker
from docker.models.images import Image
from docker.models.containers import _create_container_args

def create_with_device_request(client, image, command, device_request=None, **kwargs):
    if isinstance(image, Image):
        image = image.id
    kwargs['image'] = image
    kwargs['command'] = command
    kwargs['version'] = client.containers.client.api._version
    create_kwargs = _create_container_args(kwargs)

    # modification to the original create function
    if device_request is not None:
        create_kwargs['host_config']['DeviceRequests'] = [device_request]
    # end modification

    resp = client.api.create_container(**create_kwargs)
    return client.containers.get(resp['Id'])

# Example usage
device_request = {
    'Driver': 'nvidia',
    'Capabilities': [['gpu'], ['nvidia'], ['compute'], ['compat32'], ['graphics'], ['utility'], ['video'], ['display']],  # not sure which capabilities are really needed
    'Count': -1,  # enable all gpus
}

container = create_with_device_request(docker.from_env(), 'nvidia/cuda:9.0-base', 'nvidia-smi', device_request, ...)

I think the cli client sets the NVIDIA_VISIBLE_DEVICES environment variable, so it's probably a good idea to do the same with environment={'NVIDIA_VISIBLE_DEVICES': 'all'} as parameter of the create_with_device_request() call. This enables all available gpus. You could modify this with different device_requests:

# enable two gpus
device_request = {
    'Driver': 'nvidia',
    'Capabilities': ...,
    'Count': 2,  # enable two gpus
}

# enable gpus with id or uuid
device_request = {
    'Driver': 'nvidia',
    'Capabilities': ...,
    'DeviceIDs': ['0', 'GPU-abcedfgh-1234-a1b2-3c4d-a7f3ovs13da1']  # enable gpus with id 0 and uuid
}

The environment parameter should then look like {'NVIDIA_VISIBLE_DEVICES': '0,1'} respectively {'NVIDIA_VISIBLE_DEVICES': '0,GPU-xxx'}

Bluemi avatar Aug 20 '19 14:08 Bluemi

I‘m not sure which capabilities are really needed too!

Does create_service support device request param?

I use nvidia runtime instead.

ffteen avatar Aug 22 '19 04:08 ffteen

As far as I can tell, services.create() does not support device requests.

Setting runtime='nvidia' is definitely the better approach, if possible. The problem I had was, that I use the nvidia-container-toolkit which does not require to install the nvidia-runtime, so setting nvidia runtime leads to Error: unknown runtime specified nvidia, while using --gpus=all works as expected.

Is there a better way to use nvidia-gpus with the nvidia-container-toolkit?

Bluemi avatar Aug 22 '19 10:08 Bluemi

I have a change (that appears to work) that allows the "gpus" option in my fork. I'd like to create a PR for it, but when running the tests, this error (which is unrelated to the change) occurs:

tests/integration/api_service_test.py:379:53: F821 undefined name 'BUSYBOX' Makefile:92: recipe for target 'flake8' failed

Is there a package that needs to be installed to fix this?

hnine999 avatar Aug 28 '19 20:08 hnine999

@hnine999 No, that's an error on our end - we'll fix it shortly. Feel free to submit your PR in the meantime!

shin- avatar Aug 28 '19 21:08 shin-

The PR from @hnine999 is #2419

jamesdbrock avatar Oct 15 '19 01:10 jamesdbrock

Hi - Any update with this feature?

rAm1n avatar Nov 13 '19 21:11 rAm1n

Any update on this? It is badly needed. docker-py is functionally broken for running GPU enabled containers.

AustinDeric avatar Jan 21 '20 04:01 AustinDeric

+1

Dmitry1987 avatar Feb 10 '20 21:02 Dmitry1987

this is actually a major feature for all data science community that runs tensorflow in docker on nvidia GPUs in the cloud. Why is this ignored for such a long time? 😞

Dmitry1987 avatar Feb 10 '20 21:02 Dmitry1987

Any update on this?

bluebox42 avatar Mar 12 '20 09:03 bluebox42

Still waiting for this to be supported... The only workaround for now is "docker run" with bash :(

On Thu, Mar 12, 2020, 02:11 bluebox42 [email protected] wrote:

Any update on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/docker-py/issues/2395#issuecomment-598081853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHPL6ATC2GVI56J6F4IQ5DRHCRNRANCNFSM4IIP5BUA .

Dmitry1987 avatar Mar 13 '20 03:03 Dmitry1987

Still waiting for this to be supported... The only workaround for now is "docker run" with bash :(

At the moment, nvidia-container-toolkit still includes nvidia-container-runtime. So, you can still add nvidia-container-runtime as a runtime in /etc/docker/daemon.json:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Then restart the docker service (sudo systemctl restart docker) and use runtime="nvidia" in docker-py as before.

jmsmkn avatar Mar 13 '20 07:03 jmsmkn

Thanks a bunch - that works BUT the daemon.json is missing a double quote in runtimes: { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }

Is there a solid fix for this issue?

chorus12 avatar Apr 03 '20 09:04 chorus12

Thanks - updated my comment with that suggestion

jmsmkn avatar Apr 03 '20 09:04 jmsmkn

Hi @jmsmkn I installed nvidia-container-toolkit in arch, but it does not come with nvidia-container-runtime. Any update with this? Thanks.

cd /usr/bin
ls | grep nvidia
nvidia-bug-report.sh
nvidia-container-cli
nvidia-container-runtime-hook
nvidia-container-toolkit
nvidia-cuda-mps-control
nvidia-cuda-mps-server
nvidia-debugdump
nvidia-modprobe
nvidia-persistenced
nvidia-settings
nvidia-sleep.sh
nvidia-smi
nvidia-xconfig

vwxyzjn avatar Jul 28 '20 13:07 vwxyzjn

@vwxyzjn arch

I think this will help

DrizzlingCattus avatar Aug 06 '20 10:08 DrizzlingCattus

Simple "gpus=" keyword parameter, please !

MikeWhittakerRyff avatar Aug 07 '20 18:08 MikeWhittakerRyff

Need this feature supported badly for lots people who are dealing data with GPU for AI and HPC. Please add this feature as soon as you guys can, we'll be very grateful.

milk4candy avatar Aug 22 '20 04:08 milk4candy

Is this issue on some agenda? (This is your second most upvoted open issue at the moment.)

christian-steinmeyer avatar Nov 03 '20 09:11 christian-steinmeyer

Hi all, I made a Python client for Docker that sits on top of the Docker client binary (the one written in go). It took me several months of work. It notably has support for gpus in docker.run(...) and docker.container.create(...), with all options that the CLI has.

It's currently only available for my sponsors, but It'll be open source with an MIT licence May 1st, 2021 🙂

https://gabrieldemarmiesse.github.io/python-on-whales/

gabrieldemarmiesse avatar Nov 08 '20 15:11 gabrieldemarmiesse

Hi all, in the end, making Python-on-whales pay-to-use wasn't a success. So I've open-sourced it.

It's free and on Pypi now. Have fun 😃

$ pip install python-on-whales
$ python
>>> from python_on_whales import docker
>>> print(docker.run("nvidia/cuda:11.0-base", ["nvidia-smi"], gpus="all"))
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://github.com/gabrieldemarmiesse/python-on-whales

gabrieldemarmiesse avatar Dec 02 '20 19:12 gabrieldemarmiesse

looks good!

On Wed, Dec 2, 2020 at 2:30 PM Gabriel de Marmiesse < [email protected]> wrote:

Hi all, in the end, making Python-on-whales pay-to-use wasn't a success. So I've open-sourced it.

It's free and on Pypi now. Have fun 😃

$ pip install python-on-whales

$ python

from python_on_whales import docker print(docker.run("nvidia/cuda:11.0-base", ["nvidia-smi"], gpus="all")) +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. |

|===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage |

|=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

https://github.com/gabrieldemarmiesse/python-on-whales

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/docker-py/issues/2395#issuecomment-737446886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHPL6CZROSUA3THJNPFA3TSS2IVHANCNFSM4IIP5BUA .

Dmitry1987 avatar Dec 02 '20 19:12 Dmitry1987

In the end, I have just written a very simple wrapper around subprocess.run, with a built arg_list that can include the required GPU parameter, that captures stdout and stderr and the return code, and the execution duration.

Incidentally I have found that the AWS ML AMI works well with Docker/nVidia, with no further tricky configuration required. All I would say is to fire up an instance using the AMI, do the required apt update/upgrades, then freeze /that/ as your AMI to use; it avoids a 5-minute delay ! For my purposes, a root volume of 200GB works fine, as opposed to the vast default root volumes you get with the g3/g4 instances (maybe required if you are going to hibernate). But am going a bit off-topic !

MikeWhittakerRyff avatar Dec 03 '20 10:12 MikeWhittakerRyff

Hello team, is this a feature that you are thinking of adding? It would be of great value

JoanFM avatar Jul 08 '21 13:07 JoanFM

@JoanFM I guess, this functionality has already been implemented:

client.containers.run(
    'nvidia/cuda:9.0-base',
    'nvidia-smi',
    device_requests=[
        docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
    ]
)

Not very elegant, but it works

matyushinleonid avatar Aug 27 '21 14:08 matyushinleonid

@matyushinleonid Thanks heaps! it worked

Data-drone avatar Aug 31 '21 13:08 Data-drone

@JoanFM I guess, this functionality has already been implemented:

client.containers.run(
    'nvidia/cuda:9.0-base',
    'nvidia-smi',
    device_requests=[
        docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
    ]
)

Not very elegant, but it works

This works! This is the only solution that actually works, thanks so much! :)

drvpn avatar Sep 01 '23 20:09 drvpn