HAMi
HAMi copied to clipboard
hami scheduler runtime panic
What happened:
What you expected to happen: working How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
- The output of
nvidia-smi -aon your host - Your docker or containerd configuration file (e.g:
/etc/docker/daemon.json) - The hami-device-plugin container logs
- The hami-scheduler container logs
- The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet) - Any relevant kernel output lines from
dmesg
Environment:
- HAMi version:2.6.0
- nvidia driver or other AI device driver version:
- Docker version from
docker version - Docker command, image and tag used
- Kernel version from
uname -a - Others:
After running online for more than 20 days, I suddenly saw that the scheduler was restarted
thanks for your report, we plan to release v2.7 this month, and this should be solved
v2.7还是会出现
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
v2.7 will still appear