katalyst-core icon indicating copy to clipboard operation
katalyst-core copied to clipboard

feat(mbm): memory bandwidth management

Open h-w-chen opened this issue 1 year ago • 1 comments

What type of PR is this?

Feature: full mbm (memory bandwidth management) functionality

What this PR does / why we need it:

this PR includes 2 major parts:

  1. memory bandwidth metrics collection: periodically collects NUMA/package level memory bandwidth/latency data, and saves to metric store for other component to use.
  2. memory bandwidth adjustment: adjust mem bandwidth quotas of numa nodes in a physical package to ensure workloads' bandwidth not impacted by noisy neighbours (those consume too much bandwidth) based on specific threshold value
Related Startup Args

By default this metric provisioner is disabled. To enable it, start up arg like below should be provided:

--metric-provisioners="...,mbw,..."

The interval (e.g. 1 second) to refresh these metrics is set as start up following arg (the default 5 seconds is not the typically desired value for these metrics):

--metric-provisioner-intervals mbw=1s

To enable mem bandwidth adjustment, and specify the adjustmant cycle interval 1 sec, the mem bandwidth threshold value in MB per second,

--enable-mbm --mbm-latency-threshold=14000 --mbm-control-interval=1s

Which issue(s) this PR fixes:

NA - new feature

Special notes for your reviewer:

this PR is arranged roughly in 3 blocks of commits, hopefully reviewers can pick the relevant bock to provide feedbacks:

  1. ci job additions;
  2. mbw lib refactorings (mbw metrics collection part)
  3. metrics provisioner implementation
  4. mbw lib mem bandwidth adjustment part
  5. mbm adjustment control inside qrm plugin (the actual adjustment is via external manager)

h-w-chen avatar Jun 14 '24 22:06 h-w-chen

Codecov Report

Attention: Patch coverage is 52.61845% with 570 lines in your changes missing coverage. Please review.

Project coverage is 56.70%. Comparing base (c9f1aaf) to head (647058d). Report is 141 commits behind head on main.

Files with missing lines Patch % Lines
pkg/mbw/monitor/monitor.go 39.54% 96 Missing and 11 partials :warning:
pkg/mbw/monitor/controller.go 21.97% 68 Missing and 3 partials :warning:
pkg/mbw/monitor/umc.go 0.00% 55 Missing :warning:
...agent/qrm-plugins/cpu/dynamicpolicy/mbm/control.go 57.84% 35 Missing and 8 partials :warning:
pkg/mbw/monitor/l3pmc.go 59.74% 28 Missing and 3 partials :warning:
pkg/mbw/utils/pci/pciutils.go 50.00% 29 Missing and 2 partials :warning:
pkg/mbw/utils/helper.go 79.72% 21 Missing and 8 partials :warning:
pkg/mbw/monitor/rdt.go 28.20% 26 Missing and 2 partials :warning:
pkg/mbw/monitor/monitor_util.go 0.00% 26 Missing :warning:
pkg/util/machine/extension.go 59.32% 21 Missing and 3 partials :warning:
... and 18 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #628      +/-   ##
==========================================
+ Coverage   56.62%   56.70%   +0.07%     
==========================================
  Files         544      571      +27     
  Lines       51408    52761    +1353     
==========================================
+ Hits        29108    29916     +808     
- Misses      18603    19095     +492     
- Partials     3697     3750      +53     
Flag Coverage Δ
unittest 56.70% <52.61%> (+0.07%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 14 '24 23:06 codecov[bot]

will use resctrl to monitor mb. development plan changed.

h-w-chen avatar Oct 24 '24 16:10 h-w-chen