Shengbo Song
Shengbo Song
Is there any dead symbol link which named `libnvidia-ml.so` or `libnvidia-ml.so.1` in your image? If so, remove them.
nvidia-smi try to dlopen `libnvidia-ml.so.1`, What's the version of your gpu-manager?
Please provides logs that contains `Mirror %s to %s` and `Vcuda %s to %s`
> The official NVIDIA k8s-device-plugin supports GPU health monitoring, so that GPU having xid error will become usable and won't get assigned to pod. > So is gpu-manager going to...
No way until NVIDIA provides api for query this section of memory consumption
What's the version of gpu-manager? I've fixed a problem in master branch but not released a image
> @mYmNeo my version is v1.0.4. What is the commit? https://github.com/tkestack/gpu-manager/pull/130
It's a defensive mechanism for gpu-manager. The gpu-admission try to assign a pod to one card to avoid fragment, but the gpu-admission schedule information is not as new as the...
Besides, your situation may be another scenario. We're working on this fix.
> Today I try to reproduce the problem. First I create 7 NVIDIA GPU Pods, each occupying 1 GPU. > > * The NVIDIA GPU Pod description > > ```yaml...