Evan Lezar comments

Results 419 comments of


                                            Evan Lezar

Resource type labelling is incomplete/incorrect

@anaconda2196 I noted from the `nvidia-smi` output that you have persistence mode disabled. Would it be possible to see what effect enabling persistence mode has on this?

can use all gpu

@majorinche could you provide the complete pod spec?

Installation failed k8s-device-plugin(v0.9.0)

@Kwonho could you describe your setup a little bit more clearly? The code path for the error you are seeing should only be triggered if one (or more) of the...

Installation failed k8s-device-plugin(v0.9.0)

Are you using the GPU-operator? Or is this a standard device plugin install? Did you update the NVIDIA Container Runtime components as part of updating to 0.9.0? Which versions of...

Installation failed k8s-device-plugin(v0.9.0)

If I recall correctly, there was a change in `libnvidia-container` [1.4.0](https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.4.0) that was required due to how the `/proc/driver/nvidia` folder was being managed by the driver. This may be what...

Installation failed k8s-device-plugin(v0.9.0)

Could you update `nvidia-docker2` to [2.6.0](https://github.com/NVIDIA/nvidia-docker/releases/tag/v2.6.0)? This should pull in the other dependencies. I will create a ticket to track adding this requirement to the documentation.

Installation failed k8s-device-plugin(v0.9.0)

Hi @anaconda2196. Is there only a single device in the host? Which version of the CUDA driver and CUDA Container Toolkit (nvidia-docker) do you have installed? See https://docs.nvidia.com/datacenter/cloud-native/kubernetes/mig-k8s.html#mig-support-in-kubernetes

[add]: support for hostNetwork parameter in daemonset deployment

Closing this PR. If required please open an MR against the repo mentioned above.

"Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]" or "Error response from daemon: failed to create shim task:"

@riddlecp there seems to be an issue with the v1.11.0 package that means that upgrading from 1.10.0 to 1.11.0 may not work as expected. Could you try to remove `nvidia-container-toolkit`...

Follow official wiki but cannot run nvidia/cuda:11.0-base docker after running nvidia/driver:460.32.03-ubuntu16.04

@junwang-wish if you are using the driver contianer, you need to set the root in your `/etc/nvidia-container-runtime/config.toml`. Since you are launching the driver container with: ``` -v /run/nvidia:/run/nvidia:shared \ ```...