crun icon indicating copy to clipboard operation
crun copied to clipboard

cgroup: change conversion from CPU shares to weight

Open giuseppe opened this issue 6 months ago • 13 comments

The OCI CPU shares (range [2-262144]) to cgroup v2 cpu.weight (range [1-10000]) conversion formula has been updated

Closes: https://github.com/containers/crun/issues/1721

giuseppe avatar May 29 '25 12:05 giuseppe

Reviewer's Guide

This PR replaces the linear CPU shares→cpu.weight mapping with a log₂-based quadratic formula, adds configure-time log2 detection, updates tests and documentation to match the new conversion, and adjusts build dependencies for Alpine.

Class diagram for updated CPU shares to weight conversion

classDiagram
    class CgroupInternal {
        +uint64_t convert_shares_to_weight(uint64_t shares)
    }

    CgroupInternal : +convert_shares_to_weight(uint64_t shares)
    CgroupInternal : <<static>>

    %% The function now uses log2, pow, and ceil for the new formula
    %% No new classes, but the method implementation is significantly changed

File-Level Changes

Change Details Files
Detect log2 availability in configure.ac with and without -lm
  • Add AC_LINK_IFELSE checks for log2
  • Define HAVE_LOG2 or error out if unavailable
  • Append -lm to LIBS when needed
configure.ac
Refactor CPU weight tests to cover new mapping edge cases
  • Loop over share→weight pairs (2→1, 1024→100, 262144→10000)
  • Invoke update and verify cgroup cpu.weight and systemd CPUWeight
  • Improve error messages for mismatches
tests/test_resources.py
Implement new log₂-based quadratic conversion in cgroup internals
  • Include <stdint.h> and <math.h>
  • Handle shares==0, clamp below 2 and above 262144
  • Compute log2, exponent and return ceil(10^exponent)
src/libcrun/cgroup-internal.h
Update documentation tables to reflect new conversion formula
  • Replace linear formula with log₂-based expression
  • Adjust formatting in UTF-8 and man-page sections
crun.1.md
crun.1
Add missing gperf dependency to Alpine build Dockerfile
  • Insert gperf into apk add line alongside other build tools
tests/alpine-build/Dockerfile

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an issue from a review comment by replying to it. You can also reply to a review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull request title to generate a title at any time. You can also comment @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment @sourcery-ai summary on the pull request to (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

  • Contact our support team for questions or feedback.
  • Visit our documentation for detailed guides and information.
  • Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai[bot] avatar May 29 '25 12:05 sourcery-ai[bot]

hmm wouldnt' this need to be opt-in? I think we'll need some way of coordinating this across the ecosystem before a runtime makes this decision

haircommander avatar May 29 '25 14:05 haircommander

hmm wouldnt' this need to be opt-in? I think we'll need some way of coordinating this across the ecosystem before a runtime makes this decision

I am not going to merge this yet, I've marked it as a Draft and queued it for "crun 2" with a bunch of other breaking changes: https://github.com/containers/crun/milestone/1

I don't see the risk of different runtimes/engines using a different formula. cpu shares/weight are relative to their peer cgroups. e.g. if you've two cgroups (/a/b/container_1 and /a/b/container_2), the weight you set for container_1 affects only container_1 and container_2 but it does not affect in any way b, which in our case is the cgroup created by Kubernetes. So unless you mix runtimes handling the same cgroup level, there should be no conflict.

Also, the entire conversion from shares to weight is an undocumented hack. We should start using the native weight value using the unified configuration in the OCI specs, instead of what cgroup v1 was doing. So in a sense, it is already "drop-in", given there is already a (better) option to specify directly the weight.

giuseppe avatar May 29 '25 15:05 giuseppe

TMT tests failed. @containers/packit-build please check.

hmm wouldnt' this need to be opt-in? I think we'll need some way of coordinating this across the ecosystem before a runtime makes this decision

I am not going to merge this yet, I've marked it as a Draft and queued it for "crun 2" with a bunch of other breaking changes: containers/crun/milestone/1

I don't see the risk of different runtimes/engines using a different formula. cpu shares/weight are relative to their peer cgroups. e.g. if you've two cgroups (/a/b/container_1 and /a/b/container_2), the weight you set for container_1 affects only container_1 and container_2 but it does not affect in any way b, which in our case is the cgroup created by Kubernetes. So unless you mix runtimes handling the same cgroup level, there should be no conflict.

Also, the entire conversion from shares to weight is an undocumented hack. We should start using the native weight value using the unified configuration in the OCI specs, instead of what cgroup v1 was doing. So in a sense, it is already "drop-in", given there is already a (better) option to specify directly the weight.

@haircommander any thoughts? If we change that as a purely implementation-detail, I guess we don't need a KEP to drive it? Would that be acceptable?

iholder101 avatar Jun 12 '25 10:06 iholder101

@haircommander any thoughts? If we change that as a purely implementation-detail, I guess we don't need a KEP to drive it? Would that be acceptable?

the KEP would have impact only the conversion performed by Kubernetes or the CRI runtime. Decisions in this project (and I'd assume runc too) are guided only by the OCI runtime-specs, not a KEP.

giuseppe avatar Jun 12 '25 10:06 giuseppe

Ephemeral COPR build failed. @containers/packit-build please check.

TMT tests failed. @containers/packit-build please check.

@haircommander any thoughts? If we change that as a purely implementation-detail, I guess we don't need a KEP to drive it? Would that be acceptable?

the KEP would have impact only the conversion performed by Kubernetes or the CRI runtime. Decisions in this project (and I'd assume runc too) are guided only by the OCI runtime-specs, not a KEP.

I was referring to whether the CRI would have a way to opt-into (or out of) this to ensure the new conversion takes place on all OCIs at the same k8s version.

Obviously decisions in this repo aren't driven by Kubernetes KEPs, I'm just raising questions about their use-case from a user perspective. I wonder if for the k8s ecosystem (which is a significant user) treating the conversion as an implementation detail is a problem, and how important it is to have an opt-in/out mechanism.

iholder101 avatar Jun 12 '25 10:06 iholder101

I was referring to whether the CRI would have a way to opt-into (or out of) this to ensure the new conversion takes place on all OCIs at the same k8s version.

Obviously decisions in this repo aren't driven by Kubernetes KEPs, I'm just raising questions about their use-case from a user perspective. I wonder if for the k8s ecosystem (which is a significant user) treating the conversion as an implementation detail is a problem, and how important it is to have an opt-in/out mechanism.

I think we are overthinking it. The current mechanism is quite broken but you are the first one who noticed and complained about it :-) Most people won't notice the difference, which is an improvement anyway.

We need to move completely away from cgroup v1. A real improvement would be how to use cpu.weight instead of relying on the conversion from cpu shares and that makes things completely in charge of Kubernetes, you get the weight value that you specify in the specs.

giuseppe avatar Jun 12 '25 11:06 giuseppe

A real improvement would be how to use cpu.weight instead of relying on the conversion from cpu shares

+100! :+1:

iholder101 avatar Jun 12 '25 11:06 iholder101

Ephemeral COPR build failed. @containers/packit-build please check.

TMT tests failed. @containers/packit-build please check.

Ephemeral COPR build failed. @containers/packit-build please check.

still LGTM )

I missed that the equivalent patch for runc was merged, so no reasons to postpone this one :-)

giuseppe avatar Jul 03 '25 07:07 giuseppe