gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

'unable to get systemd version' when using systemd cgroup driver

Open thundergolfer opened this issue 6 months ago • 2 comments

Description

We've been using the systemd cgroup driver for years, like so:

Command::new("runc")
                    .arg("--systemd-cgroup")

We're experiencing a consistent but low frequency container startup error that we'd like guidance on how to eliminate or be robust to.

running container: creating container: cannot set up cgroup for root: error parsing systemd version: unable to get systemd version

This is coming from: https://github.com/google/gvisor/blob/973b2f23e56686780f85560c1ec37fe6a0bc4c9e/runsc/cgroup/systemd.go#L268

func systemdVersion(conn *systemdDbus.Conn) (int, error) {
	vStr, err := conn.GetManagerProperty("Version")
	if err != nil {
		return -1, errors.New("unable to get systemd version")
	}

Based on observance of host metrics when this failure happens it seems related to load on the host, but our standard metrics (CPU, RAM, PSI) look fairly normal.

To make progress on this issue, we're considering:

  • Logging the actual err instead of the hardcoded message.
  • Adding a fallback to an environment read, or allowing that as an override

We don't really want to retry the container creation in this situation. We'd prefer a solution which either internally retries, if that's necessary.

Steps to reproduce

This is a sporadic error so we can't provide a reproduction.

runsc version

-version
runsc version fb842aab7730
spec: 1.2.0

docker version (if using docker)


uname

Linux ip-10-110-45-137.sa-east-1.compute.internal 5.15.0-309.180.4.el9uek.x86_64 #2 SMP Wed May 21 06:56:22 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

n/a

repo state (if built from source)

No response

runsc debug logs (if available)


thundergolfer avatar Jul 01 '25 00:07 thundergolfer

Have you tried adding retries or using an env variable as an override, and have had success with this in practice? My guess would be that if this dbus request fail, retries and later dbus requests will fail as well.

EtiennePerot avatar Jul 01 '25 00:07 EtiennePerot

My guess would be that if this dbus request fail...

I think that's likely too.

We haven't tried any intervention yet. We can start with

Logging the actual err instead of the hardcoded message.

thundergolfer avatar Jul 01 '25 01:07 thundergolfer

Hi! We're (me and @22aronl) currently students at UT taking a virtualization course, and we'd like to take this on.

We wanted to confirm the actual intention/plan mentioned in the discussion. We would replace the hardcoded "unable to get systemd version" error with one that wraps the actual underlying D-Bus error so users can diagnose intermittent failures. Beyond that, maybe we can add an optional retry (with a small, bounded backoff) since host load might correlate with this lookup occasionally failing.

Let us know if this direction aligns with what you expect. If so, I’ll proceed with a patch.

Thanks!

hexatedjuice avatar Dec 12 '25 00:12 hexatedjuice