bpm-release icon indicating copy to clipboard operation
bpm-release copied to clipboard

Experimental support for cgroups v2

Open aramprice opened this issue 1 year ago • 7 comments

These changes do not appear to impact the behavior of bpm when running on an ubuntu-jammy based stemcell (cgroups v1). It should be safe to merge this as the behavior of the code handling cgroup-v1 has not changed.


Previous context left for posterity:

Currently one test in the integration specs is failing. Unclear if this is the fault of my docker setup or if this represents an actual issue with how runc is being setup.

Tests can be run as follows:

# from the repo root
cd src/bpm/
./scripts/test-unit --keep-going

Example of the failure I'm seeing when running these tests from within the container created using ./scripts/start-docker

------------------------------
• [FAILED] [0.167 seconds]
resource limits memory [It] gets OOMed when it exceeds its memory limit
/bpm/src/bpm/integration/resource_limits_test.go:116

  Timeline >>
  If this test fails, then make sure you have enabled swap accounting! Details are in the README.
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  time="2024-06-28T22:07:30Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/memory.events: no such file or directory"
  time="2024-06-28T22:07:30Z" level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 \"/sys/fs/cgroup/bpm-0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2\" with domain controllers -- it is in an invalid state"
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stderr.log'
  BEGIN '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  END   '/bpmtmp/resource-limits-test1115196611/sys/log/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2/0cf7c73a-0b65-4f46-90a7-eb0bb809e6c2.stdout.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/resource_limits_test.go:122 @ 06/28/24 22:07:30.852
------------------------------
••••••••••••••••••••••••••
------------------------------
• [FAILED] [0.364 seconds]
start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
/bpm/src/bpm/integration/start_test.go:329

  Timeline >>
  Error: failed to start job-process: exit status 1
  [FAILED] in [It] - /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  en_US.UTF-8
  Logging to STDOUT
  Received a TERM signal
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stdout.log'
  BEGIN '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  Logging to STDERR
  [WARN  tini (1)] Reaped zombie process with pid=8
  time="2024-06-28T22:07:31Z" level=error msg="runc run failed: unable to get cgroup PIDs: read /sys/fs/cgroup/bpm-e599a26c-5d89-421d-a740-04dd490c314b/cgroup.procs: operation not supported"
  END   '/bpmtmp/start-test2475062763/sys/log/e599a26c-5d89-421d-a740-04dd490c314b/e599a26c-5d89-421d-a740-04dd490c314b.stderr.log'
  << Timeline

  [FAILED] Expected
      <int>: 1
  to match exit code:
      <int>: 0
  In [It] at: /bpm/src/bpm/integration/start_test.go:337 @ 06/28/24 22:07:31.915
------------------------------
•••••••••••••••••••••••••••••

Summarizing 2 Failures:
  [FAIL] resource limits memory [It] gets OOMed when it exceeds its memory limit
  /bpm/src/bpm/integration/resource_limits_test.go:122
  [FAIL] start when a broken runc configuration is left on the system [It] `bpm start` cleans up the broken-ness and starts it
  /bpm/src/bpm/integration/start_test.go:337

Ran 69 of 69 Specs in 27.622 seconds
FAIL! -- 67 Passed | 2 Failed | 0 Pending | 0 Skipped

aramprice avatar Jun 28 '24 22:06 aramprice

When testing these changes feel free to grab a stemcell from here:

  • storage.googleapis.com/bosh-core-stemcells-candidate/google/bosh-stemcell-0.59-google-kvm-ubuntu-noble-go_agent.tgz
  • storage.googleapis.com/bosh-core-stemcells-candidate/aws/bosh-stemcell-0.59-aws-xen-hvm-ubuntu-noble-go_agent.tgz
  • storage.googleapis.com/bosh-core-stemcells-candidate/azure/bosh-stemcell-0.59-azure-hyperv-ubuntu-noble-go_agent.tgz

Source: https://bosh.ci.cloudfoundry.org/teams/stemcell/pipelines/stemcells-ubuntu-noble/

rkoster avatar Jul 04 '24 14:07 rkoster

Hey @ystros and @klakin-pivotal. Just bumping this up in case you forgot.

jpalermo avatar Jul 25 '24 14:07 jpalermo

@ramonskie do you have any idea about the above finding?

beyhan avatar Jul 26 '24 06:07 beyhan

I have not touched anything related to memory limits. So perhaps the defaults changed

ramonskie avatar Jul 26 '24 06:07 ramonskie

I do note that in both cases, neither memory.swap.max nor memory.memsw.limit_in_bytes being checked in the new code is there:

memory.memsw.* are cgroup v1 control files, and won't be present when we're using only cgroup v2: (See the table here https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#benefits-and-purpose-of-the-memory-controller)

memory.swap.* is documented to only exist in non-root cgroups, so I expect that if you were to descend most any subdirectory of /sys/fs/cgroup you would find those files. (The relevant section of the docs starts here: https://docs.kernel.org/admin-guide/cgroup-v2.html#memory)

klakin-pivotal avatar Jul 26 '24 16:07 klakin-pivotal

The following results in passing tests on the latest GCP ubuntu-jammy, and ubuntu-noble VMs:

sudo su -
apt update && apt install --yes docker.io

git clone https://github.com/cloudfoundry/bpm-release.git
cd bpm-release
git checkout cgroup-v2-support

docker run --privileged --cgroupns host -v ${PWD}:/bpm -it cfbpm/bpm-ci:latest

./scripts/test-unit --keep-going

NOTE: the docker run command differs from scripts/start-docker in that it adds --cgroupns host

This seems to indicate that if the specs are failing on the noble stemcell that we will need to figure out what changes exist in cgroups between standard ubuntu-noble and the new stemcell, and then what changes should be made where to either accommodate the differences, or make the stemcell more like standard ubuntu-noble.

aramprice avatar Jul 26 '24 23:07 aramprice

@jpalermo I've verified that scripts/test-unit works fine on a Jammy stemcell.

aramprice avatar Aug 01 '24 18:08 aramprice