What does this PR do?

Make num_shards mirror available GPU's if CUDA_VISIBLE_DEVICES is set to "all".

Setting CUDA_VISIBLE_DEVICES=all in a podman-based (cdi) setup effectively fails to use the GPU at all, and limits num_shard to 1.

This PR attempts to recognize CUDA_VISIBLE_DEVICES being set to "all", then counts the available GPU's and set the num_shard accordingly to what is exposed into the container.

The PR is only iterating standalone GPU's and not MIG's because you cannot use the multiple MIG's instance in the same process.

/cc @OlivierDehaene @Narsil

May 28 '23 17:05 johnj

Testing environment: ubuntu 22.04 and no MIG setup (A100).

Command:

podman run --network host --shm-size 1g --rm --security-opt=label=disable --device=nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES="all" ghcr.io/huggingface/text-generation-inference:latest --model-id bigscience/bloom-560m

May 28 '23 17:05 johnj

Doesn't

podman run --network host --shm-size 1g --rm --security-opt=label=disable --device=nvidia.com/gpu=all  ghcr.io/huggingface/text-generation-inference:latest --model-id bigscience/bloom-560m

work out of the box? Notice the removal of the env var.

May 29 '23 09:05 OlivierDehaene

hi @OlivierDehaene, the lack of the env var in my comment is a copy/paste error. Good catch.

Without CUDA_VISIBLE_DEVICES=all this works fine, although only with CPU support and 1 shard in our setup.

Going to num_shard > 1 always results in an assertion error about num_gpu being greater than device count.

We also observed that with cuda 12 CUDA_VISIBLE_DEVICES=all throws a hard exception and with multiple shards, thought that an error within torch.

I honestly didn’t expect a response so quickly, I still need to modify this patch to translate all to individual device UUID’s if cuda 12 is detected. That is forth coming.

May 29 '23 11:05 johnj

Is all expected at all ?

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

Otherwise, there's no need to do parsing of anything, returning None here should take care of using all available GPUs ! However, unless I'm mistaken CUDA_VISIBLE_DEVICES=all is the bug, not the current code.

Jun 05 '23 13:06 Narsil

CUDA_VISIBLE_DEVICES=all could be the problem, however, it is currently (mis)used especially in container setups [1].

Here is how I got to supporting CUDA_VISIBLE_DEVICES=all.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#environment-variables-oci-spec

all is a supported value for CDI.

In my setup, no matter what values I set for --gpus, inside the container NVIDIA_VISIBLE_DEVICES=all seems like a static value (podman 4.6.0-dev).

# podman run --shm-size 1g -it --rm --device nvidia.com/gpu=1 --gpus 1 ...
root@...:/usr/src# env | grep all
NVIDIA_VISIBLE_DEVICES=all

You can see that CDI generates "all" as a valid device ID.

# grep "  name: all" /etc/cdi/nvidia.yaml
  name: all

The primary change with different values of --gpus or --device are the available GPU instances (/dev/nvidia0*).

returning None here should take care of using all available GPUs

Without CUDA_VISIBLE_DEVICES being set we observed that num_shard was limited to 1, and there was no workload on the GPU.

I saw two possible routes for this PR:

if CUDA_VISIBLE_DEVICES is set to all, iterate through the available GPU's and set the CUDA_VISIBLE_DEVICES env var to the list and set number of devices to num_shard. If NVIDIA_VISIBLE_DEVICES=all was checked, this would result in behavior change for many existing setups.
if CUDA_VISIBLE_DEVICES is set to all, update num_shard according to the GPU's available to the container. This seems like the much safer route for not impacting existing setups, while still supporting decoupling the CDI spec and CUDA_VISIBLE_DEVICES for every container. The CDI spec can be whatever your orchestration decides, CUDA_VISIBLE_DEVICES=all, yet would always work and do the correct thing.

I opt'ed for route 2. The hiccup is that it seems not all torch + CUDA version combinations like this idea of CUDA_VISIBLE_DEVICES=all downstream, so I am still working through this PR.

[1] https://github.com/nextflow-io/nextflow/issues/997#issuecomment-483286761, https://github.com/NVIDIA/gpu-operator/issues/365

Jun 05 '23 14:06 johnj

For the doc you linked the env variable is NVIDIA_VISIBLE_DEVICES not CUDA_VISIBLE_DEVICES. Maybe that explains it ?

Maybe we should even just not look at any nvidia env variable and use other mecanism to "read" number of GPUs so we're agnostic to it.

Jun 05 '23 15:06 Narsil

For the doc you linked the env variable is NVIDIA_VISIBLE_DEVICES not CUDA_VISIBLE_DEVICES. Maybe that explains it ?

Yeah, it feels like there is a lot of ambiguity between what nvidia is exposing via cdi and what singularity may use...and the user is stuck in the middle of it at the moment.

Maybe we should even just not look at any nvidia env variable and use other mecanism to "read" number of GPUs so we're agnostic to it.

Wow, I did not think about it this way.

Your idea is actually very nice the more I think about it...may be just do something as naive as iterating over /dev/nvidia* since these chardevs are appropriately allocated based on --gpus or --device.

The concern I would raise here that it may cause implicit behavior changes for existing setups - how do you feel about this?

Jun 05 '23 15:06 johnj

The concern I would raise here that it may cause implicit behavior changes for existing setups - how do you feel about this?

It's already the case with CUDA_VISIBLE_DEVICES. Here the current code just tries to determine how many shards we should spawn when user didn't specify, so all visible GPUs should spawn. But this env variable implicitly modifies the number of visible devices. Maybe there's a better way to access all available GPUs that is maintained by nvidia and work with all the possible env variables at once.

Jun 05 '23 15:06 Narsil

Maybe there's a better way to access all available GPUs that is maintained by nvidia and work with all the possible env variables at once.

I’ll go investigate this further and report back on this PR.

Jun 05 '23 15:06 johnj

Still working through this.

I don’t want to exec out to nvidia-smi for this. So far the alternatives I’ve explored to that are:

emulating nvidia-smi - traverse /dev/nvidia* devices and issue ioctl’s to get device info (the card is not in a fault state, etc). I feel this is fragile because there are no guarantees or docs for this I could find.
use rust-cuda to get device info … this project seems somewhat inactive.

Jun 12 '23 16:06 johnj

And one more …

use https://github.com/Cldfire/nvml-wrapper

Jun 12 '23 16:06 johnj

text-generation-inference
text-generation-inference copied to clipboard

Improve num_shard support with CUDA_VISIBLE_DEVICES=all

What does this PR do?

text-generation-inference text-generation-inference copied to clipboard

Improve num_shard support with CUDA_VISIBLE_DEVICES=all

What does this PR do?

text-generation-inference
text-generation-inference copied to clipboard