Evan Lezar

Results 419 comments of Evan Lezar

> Is there a way to run nvidia-container-runtime on io.containerd.runc.v2, not v1? I am getting the same error as OP, tried different versions of k8s-nvidia-plugin GPU on host node works...

Note that the following command doesn't use the same code path for injecting GPUs as what K8s does. ``` ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0.3-base-ubuntu20.04 cuda-11.0.3-base-ubuntu20.04 nvidia-smi ``` Would...

@zvier those are very old versions for all the packages and the device plugin. Would you be able to try with the latest versions: * `nvidia-container-toolkit`, `libnvidia-container-tools`, and `libnvidia-container1`: [`v1.10.0`](https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.10.0)...

So to summarise. If you update the versions to the latest *AND* run the test pod in `privileged` then you're able to run `nvidia-smi` in the container. This is expected...

@orkenstein the config file mentioned is installed on every host along with the NVIDIA Container Toolkit / NVIDIA Docker.

@orkenstein does that mean that you're not using the NVIDIA Device Plugin to allow GPU usage on GCloud but using instead? (could you provide a link to the `nvidia-installer` you...

@orkenstein GKE does not (currently) use the NVIDIA device plugin nor the NVIDIA container toolkit. Which means that the suggestion by @ktarplee is not applicable to you.

This is unfortunately not something that I can help with. You could try post your request here https://github.com/GoogleCloudPlatform/container-engine-accelerators/issues (which contains the device plugin used on GKE systems).

Hi @anaconda2196 could you attach the node labels detected by `gpu-feature-discovery`? Too keep it simple, let's consider the mig-strategy=single case.