Evan Lezar

Results 419 comments of Evan Lezar

@RakeshRaj97 since this issue is quite old, could you provide information on the version of the device plugin you are running and as well as example podspecs.

@ElenaHenderson the `v0.11.0` version of the device plugin has been released. Note that as @klueska mentions, the base image version that is used for the device plugin is due to...

Hi @tingweiwu. Thanks for reporting this. The device plugin marks GPUs unhealthy based on error events and it could be that we are missing this particular one. I will have...

@tingweiwu I have confirmed that the Xid=48 error is generated as a `nvmlEventTypeDoubleBitEccError` and not a `nvmlEventTypeXidCriticalError` (which is what the device plugin listens for). I have created an internal...

@jeffreydahan the GPU Device Plugin does not install or manage the driver. The expectation is that the user install this directly on the host or that this is installed by...

@davidho27941 I see from your description that you are installing version `1.0.0-beta4` of the device plugin: ``` kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml ``` The versioning of the NVIDIA Device plugin is...

You also mentioned: > I was trying to create a cluster using crio container runtime interface and flannel CNI. Does this mean that K8s is using crio to launch containers?...

Since the image works with docker, it would appear as if your NVIDIA Container Toolkit installation is at least sane. In oder to debug this further, could you uncomment the...

@davidho27941 thanks for the additional information. You mentioned in your description that k8s is configured to launch containers using crio: > I was trying to create a cluster using crio...

@Mr-Linus note that `1.0.0-beta4` is not supported and `v0.10.0` is *the latest release*. If you are experiencing problems with this release we should try to determine why this is.