Evan Lezar
Evan Lezar
@anaconda2196 I noted from the `nvidia-smi` output that you have persistence mode disabled. Would it be possible to see what effect enabling persistence mode has on this?
@majorinche could you provide the complete pod spec?
@Kwonho could you describe your setup a little bit more clearly? The code path for the error you are seeing should only be triggered if one (or more) of the...
Are you using the GPU-operator? Or is this a standard device plugin install? Did you update the NVIDIA Container Runtime components as part of updating to 0.9.0? Which versions of...
If I recall correctly, there was a change in `libnvidia-container` [1.4.0](https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.4.0) that was required due to how the `/proc/driver/nvidia` folder was being managed by the driver. This may be what...
Could you update `nvidia-docker2` to [2.6.0](https://github.com/NVIDIA/nvidia-docker/releases/tag/v2.6.0)? This should pull in the other dependencies. I will create a ticket to track adding this requirement to the documentation.
Hi @anaconda2196. Is there only a single device in the host? Which version of the CUDA driver and CUDA Container Toolkit (nvidia-docker) do you have installed? See https://docs.nvidia.com/datacenter/cloud-native/kubernetes/mig-k8s.html#mig-support-in-kubernetes
Closing this PR. If required please open an MR against the repo mentioned above.
@riddlecp there seems to be an issue with the v1.11.0 package that means that upgrading from 1.10.0 to 1.11.0 may not work as expected. Could you try to remove `nvidia-container-toolkit`...
@junwang-wish if you are using the driver contianer, you need to set the root in your `/etc/nvidia-container-runtime/config.toml`. Since you are launching the driver container with: ``` -v /run/nvidia:/run/nvidia:shared \ ```...