ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Generate correct output on NumCPU() when using cgroups2

Open nickorlow opened this issue 1 year ago • 26 comments

Updates the NumCPU() function in pkg/util/runtime/cpu_linux.go to work with cgroups2.

What this PR does / why we need it:

Change is required for proper functionality with cgroups2.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] CVE Report (Scanner found CVE and adding report)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation only

Which issue/s this PR fixes

Fixes #9665

How Has This Been Tested?

Generated example files that were within the [cgroups2 spec](https://docs.kernel.org/admin-guide/cgroup-v2.html and tested our code to ensure it returned the correct outputs for each case.

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.
  • [x] I've read the CONTRIBUTION guide
  • [ ] I have added unit and/or e2e tests to cover my changes.
  • [ ] All new and existing tests passed.
  • [ ] Added Release Notes.

Does my pull request need a release note?

Added support for detecting CPU count properly when using cgroups2

nickorlow avatar Mar 31 '23 05:03 nickorlow

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: nickorlow / name: Nicholas Orlowsky (8f86603dbdc316293425dc22bf82ce4aca4b7d3c, a9f9793a1fbe269660c70ef9ea33c03f541cc689, 3814e1f01fbfebe4c8c9f88e57cd424a56578537, be6f1d54c75af5c7ce6db646fe99deb0f77b6570, 475adf734ad9f1d85ece1d344a2b2636de7b4fa4, c5dad5e46116382ef8ec02a648e9c48d0ef1ea19, 0a01f555d19a9456a9924ce19e5ad51808805e91, 6d96e111c8e15081c36bf0448745cb3eb73b1cb4, e4a11295abc4bfd72990ebf7d7b25b6ecfe17cca, 63fb4c65126347dd7d578bfaf699d0843852708c, c23cc0c33877eab2865fa21240318305cfadaf44, aa9a87621742343c8188cc7f89be234e9f956e8f, f35dae9b1117a6e0da532702bb4978ea8547175f, 3714c2c426885e8b4730cefc6b04d3d997102bec, 221e85f6f2e11455cdeeb5b2d4cfd93b51b575c1, a8028a576f3174e9063f55ccc96c5345bf0357b1, b270b4a8bf108dd210b209b1babb61a6bf5b840d, 0f4d054c0766cacc3e4fde487f8ac0439f1cd0ac, ad1fb03f002d6378a9293e08c2c9163030f9287c, dad8086cb2b992c37736e2645aaabd0a36129735, fddf4e034c0d766a4815247b414cbd71a1869456, 3ae35a045dc0d103c6d6511ffa516ebddb11ac14, 30dc62987192e6a6ca797c403148549b86bd30f7, a080ea1f29bf1b48b5cfe8cdda2f0278c185f487, 3a64d7402c1869c4a3dc60056b797d27f9765beb, 7f6472617bf086712d995418e2e4df98e2ee4682, 405a5aa44c828d151c62d914e68cd39954370152, 8621dfc66dbe9c84b4e0bcae8916f44ef4d2620a, 9e79a360204a29613061e0055315538696287788, 7c4ac85a48c8d55417f33648dab87641a7105b7e, ac0f6fcd398065bd2ce54c655480a7ed32526127)
  • :white_check_mark: login: sridharnandigam / name: Sridhar Nandigam (165d0573618c5bcd42962de100b75666c6fe15b3)

Welcome @nickorlow!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Mar 31 '23 05:03 k8s-ci-robot

Hi @nickorlow. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 31 '23 05:03 k8s-ci-robot

Hi @nickorlow ,

Thanks for this.

Please explore contributing tests for this. Particularly e2e tests.

regards

longwuyuan avatar Mar 31 '23 06:03 longwuyuan

/triage accepted /kind bug

longwuyuan avatar Mar 31 '23 06:03 longwuyuan

Hi @nickorlow ,

Thanks for this.

Please explore contributing tests for this. Particularly e2e tests.

regards

Should we include the tests in this PR or make a separate one?

nickorlow avatar Mar 31 '23 06:03 nickorlow

I think that the change and the e2e-test in this PR will help feedback. thanks

longwuyuan avatar Mar 31 '23 06:03 longwuyuan

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: nickorlow / name: Nicholas Orlowsky (e4a11295abc4bfd72990ebf7d7b25b6ecfe17cca, 221e85f6f2e11455cdeeb5b2d4cfd93b51b575c1, 3ae35a045dc0d103c6d6511ffa516ebddb11ac14, 7f6472617bf086712d995418e2e4df98e2ee4682, a080ea1f29bf1b48b5cfe8cdda2f0278c185f487, f35dae9b1117a6e0da532702bb4978ea8547175f, dad8086cb2b992c37736e2645aaabd0a36129735, 7c4ac85a48c8d55417f33648dab87641a7105b7e)
  • :white_check_mark: login: sridharnandigam / name: Sridhar Nandigam (165d0573618c5bcd42962de100b75666c6fe15b3)

Hi @nickorlow I used the code changes you did and patched the same files. Apparently, the code seems to be not correct and usage of a function FindCgroupMountpoint is forbidden for the cgroups v2 and thus leads to runtime.NumCPU() always.

When tested in GKE 1.26 here was the error from the code: not implemented for cgroup v2 unified hierarchy

[nginx@pree-new-ic-gke-pree-ingress-96c6776d8-w66ch nginx-config]$ uname -a
Linux pree-new-ic-gke-pree-ingress-96c6776d8-w66ch 5.15.89+ #1 SMP Sat Mar 18 09:27:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

----


[nginx@pree-new-ic-gke-pree-ingress-96c6776d8-w66ch nginx-config]$ ls /sys/fs/cgroup/
cgroup.controllers        cpu.max                   hugetlb.1GB.events        io.pressure               memory.stat
cgroup.events             cpu.max.burst             hugetlb.1GB.events.local  io.stat                   memory.swap.current
cgroup.freeze             cpu.pressure              hugetlb.1GB.max           memory.current            memory.swap.events
cgroup.kill               cpu.stat                  hugetlb.1GB.rsvd.current  memory.events             memory.swap.high
cgroup.max.depth          cpu.weight                hugetlb.1GB.rsvd.max      memory.events.local       memory.swap.max
cgroup.max.descendants    cpu.weight.nice           hugetlb.2MB.current       memory.high               pids.current
cgroup.procs              cpuset.cpus               hugetlb.2MB.events        memory.low                pids.events
cgroup.stat               cpuset.cpus.effective     hugetlb.2MB.events.local  memory.max                pids.max
cgroup.subtree_control    cpuset.cpus.partition     hugetlb.2MB.max           memory.min                rdma.current
cgroup.threads            cpuset.mems               hugetlb.2MB.rsvd.current  memory.numa_stat          rdma.max
cgroup.type               cpuset.mems.effective     hugetlb.2MB.rsvd.max      memory.oom.group
cpu.idle                  hugetlb.1GB.current       io.max                    memory.pressure

References :

  • https://github.com/opencontainers/runc/blob/v1.1.5/libcontainer/cgroups/v1_utils.go#L18
  • https://github.com/opencontainers/runc/commit/142d0f2d5d75a2848beefb373708dab78f6d9329

Preetham777 avatar Apr 24 '23 10:04 Preetham777

Hey @Preetham777 ill look into this today. Thank you for finding it!

nickorlow avatar Apr 24 '23 14:04 nickorlow

@Preetham777 just pushed a fix. Have yet to test it (writing the e2e tests for it right now) but I believe this should resolve your issue.

nickorlow avatar Apr 24 '23 16:04 nickorlow

i dont think our kind e2e tests use cgroup v2 so none of our e2e tests will hit the new code paths.

How are we certain this will work on cgroup v2 clusters.

Also looks like this is a merge conflict in the e2e files. /hold

strongjz avatar May 10 '23 21:05 strongjz

i dont think our kind e2e tests use cgroup v2 so none of our e2e tests will hit the new code paths.

How are we certain this will work on cgroup v2 clusters.

Also looks like this is a merge conflict in the e2e files. /hold

Our plan for the e2e tests was to create the cgroups files that NumCPU reads from and then run the NumCPU function to check that it's value is consistent with what we expect. Does this seem like a good approach?

We have tested the code by creating a separate standalone program with the same code from the cpu_linux.go fille and running it on a cgroups2 system. We have not yet built and deployed this on a cluster.

I've removed some of our e2e progress as it wasn't finished yet.

nickorlow avatar May 12 '23 18:05 nickorlow

Any news about this PR?

LeMyst avatar May 31 '23 09:05 LeMyst

Any news about this PR?

Just fixed the formatting issue. Would appreciate input on what to do with the e2e tests regarding my last comment (or if we should write any at all)

nickorlow avatar Jun 01 '23 04:06 nickorlow

Any news on this PR? bumping into the same issue :-(

MKruger777 avatar Jun 08 '23 07:06 MKruger777

Sorry for the delay. Let me take a look.

/assign

tao12345666333 avatar Jun 13 '23 05:06 tao12345666333

Deploy Preview for kubernetes-ingress-nginx canceled.

Name Link
Latest commit be6f1d54c75af5c7ce6db646fe99deb0f77b6570
Latest deploy log https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/64b4bde0a3db7d0008bab010

netlify[bot] avatar Jul 16 '23 17:07 netlify[bot]

Added e2e tests. I made it so that the cgroups files are created during the test so they're not somewhere else in the test config and also we can test the correct output on v1/v2 without requiring the kind cluster be re-created with different cgroup settings.

nickorlow avatar Jul 17 '23 03:07 nickorlow

Thank you for your contribution

tao12345666333 avatar Jul 17 '23 05:07 tao12345666333

LGTM for the moment

LeMyst avatar Sep 12 '23 17:09 LeMyst

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LeMyst, nickorlow Once this PR has been reviewed and has the lgtm label, please ask for approval from tao12345666333. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Sep 14 '23 16:09 k8s-ci-robot

@nickorlow @tao12345666333 Looks like this is stalled/forgotten?

@nickorlow @tao12345666333 Looks like this is stalled/forgotten?

I think so.. anything more I need to do to have this merged? I believe all requested changes have been made

nickorlow avatar Dec 06 '23 15:12 nickorlow

Any news?

LeMyst avatar Feb 19 '24 13:02 LeMyst

Deploy Preview for kubernetes-ingress-nginx canceled.

Name Link
Latest commit ac0f6fcd398065bd2ce54c655480a7ed32526127
Latest deploy log https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/6647f9f3d22d550009471264

netlify[bot] avatar Feb 21 '24 16:02 netlify[bot]

Any plans to have this merged? Thanks!

scenusa-bd avatar Mar 29 '24 16:03 scenusa-bd

@nickorlow Sorry for the long long delay, could you please resolve the conflicts and make a rebase? Then we can move forward, thank you!

tao12345666333 avatar Mar 30 '24 15:03 tao12345666333

@nickorlow , can we get this through, please? thank you :)

divigarg avatar May 17 '24 19:05 divigarg