HAMi
HAMi copied to clipboard
Improve performance in condition of large scale pod allocation
Improve performance in condition of large scale pod allocation
1. Issue or feature description
Users reported that in case of large allocation, the HAMi scheduler would surfer performance downgrade. They further discovered the function LockNode and ReleaseNodeLock would cause too many retry. And the solution would be change the lock granularity to GPU uuid.
e.g. map[GpuUUID]Lock instead of LockNode
https://github.com/Project-HAMi/HAMi/blob/8b5e5b88e75a68019c46a2caaa05e1995744a13d/pkg/device/nvidia/device.go#L88
2. Steps to reproduce the issue
Allocate hundreds of pods at one time.