Experimental support for cgroups v2
These changes do not appear to impact the behavior of bpm when running on an ubuntu-jammy based stemcell (cgroups v1). It should be safe to merge this as the behavior of the code handling cgroup-v1 has not changed.
Previous context left for posterity:
Currently one test in the integration specs is failing. Unclear if this is the fault of my docker setup or if this represents an actual issue with how runc is being setup.
Tests can be run as follows:
# from the repo root
cd src/bpm/
./scripts/test-unit --keep-going
Example of the failure I'm seeing when running these tests from within the container created using ./scripts/start-docker
------------------------------
• [FAILED] [0.167 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116
Timeline >>
If this test fails, then make sure you have enabled swap accounting! Details are in the README.
Error: failed to start job-process: exit status 1
[FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
time="2024-06-28T22:07:30Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/memory.events: no such file or directory"
time="2024-06-28T22:07:30Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2\" with domain controllers -- it is in an invalid state"
END '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
END '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
<< Timeline
[FAILED] Expected
<int>: 1
to match exit code:
<int>: 0
In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
------------------------------
••••••••••••••••••••••••••
------------------------------
• [FAILED] [0.364 seconds]
start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:329
Timeline >>
Error: failed to start job-process: exit status 1
[FAILED] in [It] - /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
en_US.UTF-8
Logging to STDOUT
Received a TERM signal
END '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
Logging to STDERR
[WARN tini (1)] Reaped zombie process with pid=8
time="2024-06-28T22:07:31Z" level=error msg="runc run failed: unable to get cgroup PIDs: read /sys/fs/cgroup/bpm-e599a26c-5d89-421d-a740-04dd490c314b/cgroup.procs: operation not supported"
END '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
<< Timeline
[FAILED] Expected
<int>: 1
to match exit code:
<int>: 0
In [It] at: /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
------------------------------
•••••••••••••••••••••••••••••
Summarizing 2 Failures:
[FAIL] resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:122
[FAIL] start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:337
Ran 69 of 69 Specs in 27.622 seconds
FAIL! -- 67 Passed | 2 Failed | 0 Pending | 0 Skipped
When testing these changes feel free to grab a stemcell from here:
- storage.googleapis.com/bosh-core-stemcells-candidate/google/bosh-stemcell-0.59-google-kvm-ubuntu-noble-go_agent.tgz
- storage.googleapis.com/bosh-core-stemcells-candidate/aws/bosh-stemcell-0.59-aws-xen-hvm-ubuntu-noble-go_agent.tgz
- storage.googleapis.com/bosh-core-stemcells-candidate/azure/bosh-stemcell-0.59-azure-hyperv-ubuntu-noble-go_agent.tgz
Source: https://bosh.ci.cloudfoundry.org/teams/stemcell/pipelines/stemcells-ubuntu-noble/
Hey @ystros and @klakin-pivotal. Just bumping this up in case you forgot.
@ramonskie do you have any idea about the above finding?
I have not touched anything related to memory limits. So perhaps the defaults changed
I do note that in both cases, neither memory.swap.max nor memory.memsw.limit_in_bytes being checked in the new code is there:
memory.memsw.* are cgroup v1 control files, and won't be present when we're using only cgroup v2: (See the table here https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#benefits-and-purpose-of-the-memory-controller)
memory.swap.* is documented to only exist in non-root cgroups, so I expect that if you were to descend most any subdirectory of /sys/fs/cgroup you would find those files. (The relevant section of the docs starts here: https://docs.kernel.org/admin-guide/cgroup-v2.html#memory)
The following results in passing tests on the latest GCP ubuntu-jammy, and ubuntu-noble VMs:
sudo su -
apt update && apt install --yes docker.io
git clone https://github.com/cloudfoundry/bpm-release.git
cd bpm-release
git checkout cgroup-v2-support
docker run --privileged --cgroupns host -v ${PWD}:/bpm -it cfbpm/bpm-ci:latest
./scripts/test-unit --keep-going
NOTE: the docker run command differs from scripts/start-docker in that it adds --cgroupns host
This seems to indicate that if the specs are failing on the noble stemcell that we will need to figure out what changes exist in cgroups between standard ubuntu-noble and the new stemcell, and then what changes should be made where to either accommodate the differences, or make the stemcell more like standard ubuntu-noble.
@jpalermo I've verified that scripts/test-unit works fine on a Jammy stemcell.