CPU-Pooler
CPU-Pooler copied to clipboard
Align memory allocations to exclusive CPU socket
CPU Pooler already works on the cpuset cgroup controller of the containers created by Kubernetes, by setting cpuset.cpus to the appropriate value. Unfortunetaly Kubernetes itself does not support making sure memory allocations -be them operative/RAM, or huge pages- are tied to the same CPU socket for high performance workloads. While there is a PR in the works to implement a Memory Manager in kubelet properly setting cpuset.mems(!) to the correct value, it is not foreseen to arrive earlier than 1.20, or 1.21. Additionally, the Memory Manager alpha version will only solve memory alignment for Pods belonging to the Guaranteed QoS class, i.e. managed by native CPU Manager. While this issue has been raised on the PR, it will not be addressed earlier than beta release which is currently unscheduled.
However, on nodes managed by CPU-Pooler setting another attribute inside the cpuset structure would be very low effort indeed! This would effectively mimic the functionality of the upcoming Memory Manager, and would make sure NUMA alignment can be guaranteed even without Memory Manager. The base idea is simple, when a container:
- requests exclusive CPU resources from a pool managed by CPU-Pooler
- and the assigned CPU core all belong to the same socket CPU-Pooler can simply write the socket ID into cpuset.mem, in addition to properly filling out cpuset.cpus.
Additional things to consider:
- this is kind of an invasive feature so would require a new feature flag to control it
- control could be on pool level, i.e. possibility to create exclusive pools with, or without guaranteeing memory NUMA alignment
- the feature would pin the operative / RAM allocation too to the same NUMA, which can cause all kinds of edge cases detailed in the Memory Manager design document
- memory alignment is only needed for exclusive pool requests, and never needed for shared, or default requests
- shared + exclusive can be also aligned to socket, but only the exclusive portion should be taken into account
@CsatariGergely @tamas-laczko @balintTobik @bvarga87 @kedmison ^^ if let's say somebody would be really interested in the ability to guarantee memory NUMA pinning on a multi-socket HW, and said somebody would ever contact you for short term solutions - this is the way to do it. do what you want with the information wink-wink