Shengbo Song

Results 54 comments of Shengbo Song

Is there any dead symbol link which named `libnvidia-ml.so` or `libnvidia-ml.so.1` in your image? If so, remove them.

nvidia-smi try to dlopen `libnvidia-ml.so.1`, What's the version of your gpu-manager?

Please provides logs that contains `Mirror %s to %s` and `Vcuda %s to %s`

> The official NVIDIA k8s-device-plugin supports GPU health monitoring, so that GPU having xid error will become usable and won't get assigned to pod. > So is gpu-manager going to...

What's the version of gpu-manager? I've fixed a problem in master branch but not released a image

> @mYmNeo my version is v1.0.4. What is the commit? https://github.com/tkestack/gpu-manager/pull/130

It's a defensive mechanism for gpu-manager. The gpu-admission try to assign a pod to one card to avoid fragment, but the gpu-admission schedule information is not as new as the...

Besides, your situation may be another scenario. We're working on this fix.

> Today I try to reproduce the problem. First I create 7 NVIDIA GPU Pods, each occupying 1 GPU. > > * The NVIDIA GPU Pod description > > ```yaml...