gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker

Open hyc-yuchen opened this issue 2 years ago • 2 comments

when i use nvidia-smi in pod it comes err that : fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker

hyc-yuchen avatar May 26 '23 05:05 hyc-yuchen

Same problem, already set '--container-runtime-endpoint=/var/run/containerd/containerd.sock'

image: tkestack/gpu-manager:v1.1.5 runtime: containerd K8s: v1.24.17

Maybe it is ctr's namespace problem, but I don't know how to debug.

pandaoknight avatar Dec 12 '23 03:12 pandaoknight

Is the cgroup version used on the host machine v1 or v2? gpu-manager code uses the path of cgroup v1 to try to read the PID of the container process relative to the host machine, if the host machine is running cgroup v2 it will cause gpu-manager to not be able to read it.

xxsoul avatar Jul 26 '24 01:07 xxsoul