cli icon indicating copy to clipboard operation
cli copied to clipboard

DevContainer cannot start with "HostRequirements.gpu = optional" when GPU driver installed but GPU doesn't exist.

Open jonnyyu opened this issue 1 year ago • 1 comments

Thanks for working on optional GPU support. The configuration is exactly what I'm looking for!, but the current GPU check logic still not work in my use case.

According to https://github.com/devcontainers/cli/pull/173, Dev container checks if docker info contains nvidia runtimes for GPU driver support.

I use Linux EC2 instance with CUDA driver installed, I switch between non-GPU or GPU instance types depending on the type of current work (coding or training). So on non-GPU instance, the nvidia container runtime still available since it is installed.

A suggestion is, Nvidia container runtime also provides a cli tool nvidia-container-cli, can be used to get real GPU info.

on GPU instance:

➜  ~ nvidia-container-cli info
NVRM version:   510.47.03
CUDA version:   11.6

Device Index:   0
Device Minor:   0
Model:          Tesla T4
Brand:          Nvidia
GPU UUID:       GPU-630ef986-c301-d90b-3581-8afa4becebc3
Bus Location:   00000000:00:1e.0
Architecture:   7.5

on non-GPU instance:

➜  ~ nvidia-container-cli info
nvidia-container-cli: initialization error: nvml error: driver not loaded

(exit code 1)

https://github.com/NVIDIA/nvidia-container-runtime

Thanks!

jonnyyu avatar Dec 08 '22 08:12 jonnyyu