guardian icon indicating copy to clipboard operation
guardian copied to clipboard

gdn fail with runc error in ubuntu 2204 lts

Open xtremerui opened this issue 1 year ago • 3 comments

Description

When running Concourse binary (using gdn for containization) in google VM with ubuntu-2204-lts family as OS image, we see errors as below

Aug 25 21:56:12 smoke-splendid-earwig concourse[4460]: {"timestamp":"2022-08-25T21:56:12.809930620Z","level":"error","source":"guardian","message":"guardian.create.containerizer-create.runtime-create-failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting \"cgroup\" to rootfs at \"/sys/fs/cgroup\" caused: invalid argument","handle":"a17876d5-647e-492d-6ae2-311b1a56d718","session":"40.3"}}

For comparison, when running Concourse by docker compose locally we don't see the error. The OS image is the same as the VM in GCP

root@c29ddbf435bd:/src# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

but is kernel is 5.10.47-linuxkit.

Also, when running Concourse with containerd runtime that directly using runc v1.1.4 we dont see error in both local docker or gcp VM.

Maybe it is related to the older runc that is currently used in guardian where it might not work well with specific newer kernel in ubuntu Jammy jellyfish?

  • Guardian release version: 1.22
  • Linux kernel version: 5.15.0-1016-gcp
  • Concourse version: latest dev
  • Go version: 1.19

xtremerui avatar Aug 26 '22 14:08 xtremerui

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Aug 26 '22 14:08 cf-gitbot

This issue is being worked on under the Garden-runc-release/#233 issue

MarcPaquette avatar Sep 06 '22 19:09 MarcPaquette

It looks like this is the same issue that other contain runtimes have had with Jammy: https://github.com/containers/podman/issues/12559 .

Jammy uses cgroupv2 in the kernel, and it delegates cgroup authority to sub-processes (like the container runtime) as cgroupv2. runc supports cgroupv2 as of v1.0.0 release, but gdn is also directly altering cgroups using the old v1 schema: https://github.com/cloudfoundry/guardian/blob/8deac7e439aca41e515a74d7c8489081b8961b97/guardiancmd/command_linux.go#L307

This will require some substantial changes in how cgroups are managed in guardian in order to support new distributions that have switched to cgroupv2.

dtimm avatar Oct 19 '22 16:10 dtimm

Some updates:

Concourse with latest gdn can run successfully on an image with cgroups v1 enabled based on gcloud image family ubuntu-2204-lts .

xtremerui avatar Nov 03 '22 21:11 xtremerui