kind icon indicating copy to clipboard operation
kind copied to clipboard

Failed to start ContainerManager" err="failed to get rootfs info: failed to get mount point for device..."

Open rnnr opened this issue 10 months ago • 11 comments
trafficstars

I'm getting the error as desrivd in known issues (https://kind.sigs.k8s.io/docs/user/known-issues/) but the creating and using the cluster config file did not change anything:

Jan 05 23:26:32 kind-control-plane kubelet[1763]: E0105 23:26:32.106420 1763 kubelet.go:1649] "Failed to start ContainerManager" err="failed to get rootfs info: failed to get mount point for device "/dev/nvme0n1p2 ": no partition info for device "/dev/nvme0n1p2"" Jan 05 23:26:32 kind-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE

My cluster yaml looks this way, the partition has file system F2FS: (starting it with kind create cluster --config ~/.kind/cluster.yaml)

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /dev/nvme0n1p2
      containerPath: /dev/nvme0n1p2
      propagation: HostToContainer

kind version: kind v0.26.0 go1.23.4 linux/amd64

docker version:

Client:
 Version:           26.1.0
 API version:       1.45
 Go version:        go1.23.1
 Git commit:        9714adc6c797755f63053726c56bc1c17c0c9204
 Built:             Sun Dec  8 21:43:42 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          26.1.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.23.3
  Git commit:       061aa95809be396a6b5542618d8a34b02a21ff77
  Built:            Thu Dec 12 15:02:12 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.15
  GitCommit:        926c9586fe4a6236699318391cd44976a98e31f1
 runc:
  Version:          1.1.12
  GitCommit:        51d5e94601ceffbbd85688df1c928ecccbfa4685
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad007797e0dcd8b7126f27bb87401d224240

Is there something else I should check or another workaround?

rnnr avatar Jan 05 '25 23:01 rnnr

I think we need to know a little more about your environment. Can you include the output from docker info?

You can also run kind create cluster --config ~/.kind/cluster.yaml --retain to keep the node container around after failure to inspect it for config issues or look for log messages by exec'ing in and running commands. You can also do kind export logs to collect up the various logs of interest from the node.

stmcginnis avatar Jan 06 '25 13:01 stmcginnis

the partition has file system F2FS

not familiar with this one, but using a more common eg etx4 partition will probably fix it.

BenTheElder avatar Jan 06 '25 17:01 BenTheElder

docker info:

Client:
 Version:    26.1.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.14.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.28.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 6
  Running: 3
  Paused: 0
  Stopped: 3
 Images: 31
 Server Version: 26.1.0
 Storage Driver: overlay2
  Backing Filesystem: f2fs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 926c9586fe4a6236699318391cd44976a98e31f1
 runc version: 51d5e94601ceffbbd85688df1c928ecccbfa4685
 init version: de40ad007797e0dcd8b7126f27bb87401d224240
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.6.62-gentoo-dist
 Operating System: Gentoo Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 188.5GiB
 Name: shodan
 ID: DLAE:EMXQ:UF4S:N7LR:JXGR:V5YJ:RLBU:FDIJ:C6FZ:C3X5:F7NM:HU5M
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: rnnr
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

rnnr avatar Jan 06 '25 22:01 rnnr

the partition has file system F2FS

not familiar with this one, but using a more common eg etx4 partition will probably fix it.

My rootfs is on this and kind wants to know about rootfs, or no? How should I change the disk it uses - the cluster config file seems to be ignored.

rnnr avatar Jan 06 '25 22:01 rnnr

You can also run kind create cluster --config ~/.kind/cluster.yaml --retain to keep the node container around after failure to inspect it for config issues or look for log messages by exec'ing in and running commands. You can also do kind export logs to collect up the various logs of interest from the node.

I did this, or maybe withou the --retain but it does not matter, as the message I posted initially is repeatedly in kubelet.log and it kind create cluster gets stuck on it for a while.

Attaching the whole file - I will provide any other of the, I just do not want to flood it here with useless data, so please guide me. kubelet.log

rnnr avatar Jan 06 '25 22:01 rnnr

My rootfs is on this and kind wants to know about rootfs, or no?

kubelet is looking for stats, but from it's POV the "rootfs" will be whatever the storage for the "node" container is on.

The logs from kubelet don't make sense in this context because it's expected to be running directly on a "real" host (machine, VM), not in a container (which is not technically supported upstream)

So the rootfs in this case would be whatever filesystem docker's data root is on with your volumes and containers.

This code is not in kind, and the filesystem stats need to work inside the container.

BenTheElder avatar Jan 06 '25 22:01 BenTheElder

How should I change the disk it uses - the cluster config file seems to be ignored.

https://docs.docker.com/engine/daemon/#daemon-data-directory

BenTheElder avatar Jan 06 '25 22:01 BenTheElder

In theory we'd like kind to work with all of these, but in practice the container ecosystem is most well tested with ext4, possibly a few others, but definitely not all filesystems (and most of the relevant code is not in kind).

In the future instead of cadvisor the stats may be in kubelet and CRI (containerd here).

See also: https://github.com/kubernetes-sigs/kind/pull/1464/files (not sure if this sort of thing is relevant for f2fs)

BenTheElder avatar Jan 06 '25 22:01 BenTheElder

Thanks for the pointers, I'll look at it hopefully soon more closely. I appreciate the info, it's just there came some more pushing things.

rnnr avatar Jan 07 '25 17:01 rnnr

See also: https://github.com/kubernetes-sigs/kind/pull/1464/files (not sure if this sort of thing is relevant for f2fs)

I've checked the code. Not sure how is the function mountDevMapper supposed to be used, but the command docker info -f "{{.Driver}}" I see it callse returns "overlay2" on my machine, co the function would return false.

rnnr avatar Jan 15 '25 19:01 rnnr

Yes, we have no attempt to support F2FS specifically (and I'm not sure what is necessary for it), but you could try manually configuring the equivalent /dev/mapper mount on the off chance we have the same problem here.

https://kind.sigs.k8s.io/docs/user/configuration/#extra-mounts

BenTheElder avatar Jan 15 '25 19:01 BenTheElder

TBH, its unclear why kind even cares about the baking fs. But here is a small workaround for those who face this issue:

  1. Create an ext4 file system in a file: sudo truncate --size=10G /home/docker.img && sudo mkfs.ext4 /home/docker.img
  2. Mount it, fstab: /home/docker.img /home/docker ext4 rw,noatime,nodiratime 0 0
  3. Update docker settings to use /home/docker at it's data-root:
{
    "data-root": "/home/docker"
}

zhulik avatar Oct 05 '25 22:10 zhulik

I ran into this issue when using kind in a kata guest hostPath mounting /var/lib/kubelet from the host into the guest. The cadvisor code here will fail, as even when the host block devices are mounted into the guest, they'll have a different major:minor number.

An easy way to get kubelet to start is to turn off localStorageCapacityIsolation in the kubelet config.

  • see that the code is only called when localStorageCapacityIsolation is used: https://github.com/kubernetes/kubernetes/blob/854e67bb51e177b4b9c012928d8271704e9cb80d/pkg/kubelet/cm/container_manager_linux.go#L645

danielfoehrKn avatar Nov 13 '25 19:11 danielfoehrKn

TBH, its unclear why kind even cares about the baking fs. But here is a small workaround for those who face this issue:

It doesn't directly, but the container runtime (containerd inside the nodes) and kubelet absolutely do, they need to track filesystem stats and run overlay.

kind is aware of this when possible, in order to employ workarounds (such as using fuse-overlayfs), to enable containerd and kubelet.

In general, containers are sensitive to the backing filesystem. I recommend using common default filesystems from the ecosystem (e.g. ext4), because the kind project cannot be responsible for containerd, runc, podman, and so on having good support and performance for arbitrary filesystems.

BenTheElder avatar Nov 13 '25 19:11 BenTheElder

An easy way to get kubelet to start is to turn off localStorageCapacityIsolation in the kubelet config.

This is, however, a GA functionality in Kubernetes. IIRC it is part of conformance. YMMV.

BenTheElder avatar Nov 13 '25 19:11 BenTheElder

Yep, thanks for pointing that out. I use it for temp CI-type clusters so not an issue for me personally.

danielfoehrKn avatar Nov 13 '25 19:11 danielfoehrKn