cri-resource-manager icon indicating copy to clipboard operation
cri-resource-manager copied to clipboard

cpuallocator: improve allocation heuristics

Open marquiz opened this issue 5 years ago • 3 comments

Bring more NUMA-awareness to the cpuallocator (implemented in pkg/cpuallocator/). Discussing/reviewing the cpu allocation logic with @klihub we realized that the allocator is too simple, resulting in clearly non-optimal results. This concerns especially takeIdleCores() which should (we think) try to more aggressively and intelligently pack workloads in topology-aware manner.

The cpuallocator would need to be improved with additional tightest-fit allocation rules beyond the current topology socket/core/thread hierarchy to get to a more realistic socket/die/NUMA node/core/thread hierachy:

  • try allocating a full die if the number of requested cpus matches exactly
  • try allocating a full NUMA node if the number of cpus matches exactly
  • only then give up and try allocating mere full cores or threads, and also with these
    • try taking sub-NUMA node number of cores/threads from a single NUMA node,
    • try taking sub-die number of cores/threads from a single die

marquiz avatar Mar 06 '20 15:03 marquiz

It should also do these things, too.

It would also need to be improved with additional tightest-fit allocation rules
beyond the current topology socket/core/thread hierarchy to get to a more
realistic socket/die/NUMA node/core/thread hierachy:
  - try allocating a full die if the number of requested cpus matches exactly
  - try allocating a full NUMA node if the number of cpus matches exactly
  - only then give up and try allocating mere full cores or threads, and also with these
    - try taking sub-NUMA node number of cores/threads from a single NUMA node,
    - try taking sub-die number of cores/threads from a single die

klihub avatar Jan 28 '21 09:01 klihub

It should also do these things, too

I added these to the issue description

marquiz avatar Jan 28 '21 10:01 marquiz

Had a brainstorming session with @klihub . I think we have an agreement that a full rewrite would be the right thing to do. Some ideas we came up with:

  • use a tree structure to present the cpu hierarchy, somewhat similar to what the topology-aware policy has
  • make it possible to override functionality (function overrides)
  • configurability in the default implementation(s), too,

marquiz avatar Feb 05 '21 16:02 marquiz