ingress-nginx
ingress-nginx copied to clipboard
Generate correct output on NumCPU() when using cgroups2
Updates the NumCPU() function in pkg/util/runtime/cpu_linux.go to work with cgroups2.
What this PR does / why we need it:
Change is required for proper functionality with cgroups2.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] CVE Report (Scanner found CVE and adding report)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation only
Which issue/s this PR fixes
Fixes #9665
How Has This Been Tested?
Generated example files that were within the [cgroups2 spec](https://docs.kernel.org/admin-guide/cgroup-v2.html and tested our code to ensure it returned the correct outputs for each case.
Checklist:
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I've read the CONTRIBUTION guide
- [ ] I have added unit and/or e2e tests to cover my changes.
- [ ] All new and existing tests passed.
- [ ] Added Release Notes.
Does my pull request need a release note?
Added support for detecting CPU count properly when using cgroups2
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: nickorlow / name: Nicholas Orlowsky (a8028a576f3174e9063f55ccc96c5345bf0357b1, 7c4ac85a48c8d55417f33648dab87641a7105b7e, 7f6472617bf086712d995418e2e4df98e2ee4682, 0f4d054c0766cacc3e4fde487f8ac0439f1cd0ac, be6f1d54c75af5c7ce6db646fe99deb0f77b6570, 8f86603dbdc316293425dc22bf82ce4aca4b7d3c, e4a11295abc4bfd72990ebf7d7b25b6ecfe17cca, 3a64d7402c1869c4a3dc60056b797d27f9765beb, b270b4a8bf108dd210b209b1babb61a6bf5b840d, 3814e1f01fbfebe4c8c9f88e57cd424a56578537, c5dad5e46116382ef8ec02a648e9c48d0ef1ea19, fddf4e034c0d766a4815247b414cbd71a1869456, c23cc0c33877eab2865fa21240318305cfadaf44, a080ea1f29bf1b48b5cfe8cdda2f0278c185f487, 0a01f555d19a9456a9924ce19e5ad51808805e91, 3ae35a045dc0d103c6d6511ffa516ebddb11ac14, ad1fb03f002d6378a9293e08c2c9163030f9287c, 30dc62987192e6a6ca797c403148549b86bd30f7, 3714c2c426885e8b4730cefc6b04d3d997102bec, 405a5aa44c828d151c62d914e68cd39954370152, 475adf734ad9f1d85ece1d344a2b2636de7b4fa4, aa9a87621742343c8188cc7f89be234e9f956e8f, 63fb4c65126347dd7d578bfaf699d0843852708c, 8621dfc66dbe9c84b4e0bcae8916f44ef4d2620a, 6d96e111c8e15081c36bf0448745cb3eb73b1cb4, e9f371787e82e11b6009b77d740e88edd8691bf8, 9e79a360204a29613061e0055315538696287788, f35dae9b1117a6e0da532702bb4978ea8547175f, 221e85f6f2e11455cdeeb5b2d4cfd93b51b575c1, b2d67ff92bad02c041be12b7f02d66fc417a2e6a, ac0f6fcd398065bd2ce54c655480a7ed32526127, dad8086cb2b992c37736e2645aaabd0a36129735, a9f9793a1fbe269660c70ef9ea33c03f541cc689)
- :white_check_mark: login: sridharnandigam / name: Sridhar Nandigam (165d0573618c5bcd42962de100b75666c6fe15b3)
Welcome @nickorlow!
It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/ingress-nginx has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @nickorlow. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @nickorlow ,
Thanks for this.
Please explore contributing tests for this. Particularly e2e tests.
regards
/triage accepted /kind bug
Hi @nickorlow ,
Thanks for this.
Please explore contributing tests for this. Particularly e2e tests.
regards
Should we include the tests in this PR or make a separate one?
I think that the change and the e2e-test in this PR will help feedback. thanks
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: nickorlow / name: Nicholas Orlowsky (e4a11295abc4bfd72990ebf7d7b25b6ecfe17cca, 221e85f6f2e11455cdeeb5b2d4cfd93b51b575c1, 3ae35a045dc0d103c6d6511ffa516ebddb11ac14, 7f6472617bf086712d995418e2e4df98e2ee4682, a080ea1f29bf1b48b5cfe8cdda2f0278c185f487, f35dae9b1117a6e0da532702bb4978ea8547175f, dad8086cb2b992c37736e2645aaabd0a36129735, 7c4ac85a48c8d55417f33648dab87641a7105b7e)
- :white_check_mark: login: sridharnandigam / name: Sridhar Nandigam (165d0573618c5bcd42962de100b75666c6fe15b3)
Hi @nickorlow I used the code changes you did and patched the same files. Apparently, the code seems to be not correct and usage of a function FindCgroupMountpoint is forbidden for the cgroups v2 and thus leads to runtime.NumCPU() always.
When tested in GKE 1.26 here was the error from the code: not implemented for cgroup v2 unified hierarchy
[nginx@pree-new-ic-gke-pree-ingress-96c6776d8-w66ch nginx-config]$ uname -a
Linux pree-new-ic-gke-pree-ingress-96c6776d8-w66ch 5.15.89+ #1 SMP Sat Mar 18 09:27:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
----
[nginx@pree-new-ic-gke-pree-ingress-96c6776d8-w66ch nginx-config]$ ls /sys/fs/cgroup/
cgroup.controllers cpu.max hugetlb.1GB.events io.pressure memory.stat
cgroup.events cpu.max.burst hugetlb.1GB.events.local io.stat memory.swap.current
cgroup.freeze cpu.pressure hugetlb.1GB.max memory.current memory.swap.events
cgroup.kill cpu.stat hugetlb.1GB.rsvd.current memory.events memory.swap.high
cgroup.max.depth cpu.weight hugetlb.1GB.rsvd.max memory.events.local memory.swap.max
cgroup.max.descendants cpu.weight.nice hugetlb.2MB.current memory.high pids.current
cgroup.procs cpuset.cpus hugetlb.2MB.events memory.low pids.events
cgroup.stat cpuset.cpus.effective hugetlb.2MB.events.local memory.max pids.max
cgroup.subtree_control cpuset.cpus.partition hugetlb.2MB.max memory.min rdma.current
cgroup.threads cpuset.mems hugetlb.2MB.rsvd.current memory.numa_stat rdma.max
cgroup.type cpuset.mems.effective hugetlb.2MB.rsvd.max memory.oom.group
cpu.idle hugetlb.1GB.current io.max memory.pressure
References :
- https://github.com/opencontainers/runc/blob/v1.1.5/libcontainer/cgroups/v1_utils.go#L18
- https://github.com/opencontainers/runc/commit/142d0f2d5d75a2848beefb373708dab78f6d9329
Hey @Preetham777 ill look into this today. Thank you for finding it!
@Preetham777 just pushed a fix. Have yet to test it (writing the e2e tests for it right now) but I believe this should resolve your issue.
i dont think our kind e2e tests use cgroup v2 so none of our e2e tests will hit the new code paths.
How are we certain this will work on cgroup v2 clusters.
Also looks like this is a merge conflict in the e2e files. /hold
i dont think our kind e2e tests use cgroup v2 so none of our e2e tests will hit the new code paths.
How are we certain this will work on cgroup v2 clusters.
Also looks like this is a merge conflict in the e2e files. /hold
Our plan for the e2e tests was to create the cgroups files that NumCPU reads from and then run the NumCPU function to check that it's value is consistent with what we expect. Does this seem like a good approach?
We have tested the code by creating a separate standalone program with the same code from the cpu_linux.go fille and running it on a cgroups2 system. We have not yet built and deployed this on a cluster.
I've removed some of our e2e progress as it wasn't finished yet.
Any news about this PR?
Any news about this PR?
Just fixed the formatting issue. Would appreciate input on what to do with the e2e tests regarding my last comment (or if we should write any at all)
Any news on this PR? bumping into the same issue :-(
Sorry for the delay. Let me take a look.
/assign
Deploy Preview for kubernetes-ingress-nginx canceled.
Name | Link |
---|---|
Latest commit | be6f1d54c75af5c7ce6db646fe99deb0f77b6570 |
Latest deploy log | https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/64b4bde0a3db7d0008bab010 |
Added e2e tests. I made it so that the cgroups files are created during the test so they're not somewhere else in the test config and also we can test the correct output on v1/v2 without requiring the kind cluster be re-created with different cgroup settings.
Thank you for your contribution
LGTM for the moment
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: LeMyst, nickorlow Once this PR has been reviewed and has the lgtm label, please ask for approval from tao12345666333. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@nickorlow @tao12345666333 Looks like this is stalled/forgotten?
@nickorlow @tao12345666333 Looks like this is stalled/forgotten?
I think so.. anything more I need to do to have this merged? I believe all requested changes have been made
Any news?
Deploy Preview for kubernetes-ingress-nginx canceled.
Name | Link |
---|---|
Latest commit | e9f371787e82e11b6009b77d740e88edd8691bf8 |
Latest deploy log | https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/664b5add62526c000838b40f |
Any plans to have this merged? Thanks!
@nickorlow Sorry for the long long delay, could you please resolve the conflicts and make a rebase? Then we can move forward, thank you!
@nickorlow , can we get this through, please? thank you :)
@tao12345666333 Everything updated. Had to make some changes to the tests since docker mounts the cgroup directory as ro unless you run it in privileged mode.
/lgtm