HAMi
HAMi copied to clipboard
The allocated GPU memory does not match the actual one.
1. Issue or feature description
I only requested 1024 MiB of GPU memory for the pod, but in reality, it can use up to 30480 MiB of GPU memory.
In the end, I found out that it was because there happened to be a POD on this node that had requested 30480 MiB of GPU memory and was restarting.
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
Common error checking:
- [ ] The output of
nvidia-smi -a
on your host - [ ] Your docker or containerd configuration file (e.g:
/etc/docker/daemon.json
) - [ ] The vgpu-device-plugin container logs
- [ ] The vgpu-scheduler container logs
- [ ] The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet
)
Additional information that might help better understand your environment and reproduce the bug:
- [ ] Docker version from
docker version
- [ ] Docker command, image and tag used
- [ ] Kernel version from
uname -a
- [ ] Any relevant kernel output lines from
dmesg