kind icon indicating copy to clipboard operation
kind copied to clipboard

Running with rootless podman doesn't work as documented

Open cwrau opened this issue 3 years ago • 10 comments

What happened: I tried running kind with rootless podman and followed the documentation. But this didn't work

What you expected to happen: That it would work

How to reproduce it (as minimally and precisely as possible): Install and configure podman for rootless, install kind. Use a terminal with systemd-scopes, like gnome-terminal. Use an OS that doesn't Delegate everything, like Arch Linux. (Seems to be done on Fedora, https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/7914#note_1523590)

Anything else we need to know?: @benzea stated in this ticket, that tools that depend on cgroups like kind does, should wrap themselves in either a unit or switch to a different scope themselves. (https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/7914#note_1523646)

Environment:

  • kind version: (use kind version): 0.14.0
  • Kubernetes version: (use kubectl version): 1.24.3
  • Docker version: (use docker info): command not found 😉
  • Podman version: (use podman version): 4.1.1
  • OS (e.g. from /etc/os-release): Arch Linux

cwrau avatar Aug 09 '22 08:08 cwrau

Docker version: (use docker info): command not found 😉

This is not helpful … podman attempts to be docker compatible and contains docker sub commands like podman info. This command is rich with host environment debug info.

I wonder if the kind authors just never got the message that units are now put into app.slice by default, and delegation need to be also enabled in intermediate slices. And if they are e.g. on Fedora, that is done by default ...

rootless podman support is contributed by @AkihiroSuda, nominally kind is developed for rootful docker.

Akihiro contributed some fedora based CI for rootless docker and podman, which does happen to be fedora based.

This is the first we’ve encountered a rootless podman user that wasn’t on fedora, as you can tell. Most of our users use docker, which has mature support. Podman support is experimental (the tool should be printing a warning when you use podman) and is fundamentally limited by some of the significant differences where it is not drop in compatible, we have entirely separate code paths for podman behavior.

No idea what exactly your problem is. But if something wants to use cgroups, it really should run in its own systemd unit and enable delegation by setting the appropriate options there. As you know, that is possible to do by writing the .service file or using systemd-run --user.

The kind process doesn’t touch cgroups. Our tool invokes docker or podman to spawn a container, inside the container we do touch cgroups.

BenTheElder avatar Aug 09 '22 10:08 BenTheElder

Host cgroups configuration is not something we currently plan to touch from the kind process, for example podman we’re invoking may actually be talking to a remote instance anyhow and it’s difficult to detect reliably (see discussions linked to #2233)

BenTheElder avatar Aug 09 '22 10:08 BenTheElder

Docker version: (use docker info): command not found 😉

This is not helpful … podman attempts to be docker compatible and contains docker sub commands like podman info. This command is rich with host environment debug info.

I thought you only needed the version, podman info:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.3-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.3, commit: ab52a597278b20173440140cd810dc9fa8785c93'
  cpuUtilization:
    idlePercent: 68.52
    systemPercent: 11.05
    userPercent: 20.44
  cpus: 16
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: steve
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
  kernel: 5.18.16-zen1-1-zen
  linkmode: dynamic
  logDriver: journald
  memFree: 3378966528
  memTotal: 33353601024
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.5-1
    path: /usr/bin/crun
    version: |-
      crun version 1.5
      commit: 54ebb8ca8bf7e6ddae2eb919f5b82d1d96863dea
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 23h 32m 1.98s (Approximately 0.96 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - hub.4allportal.net
store:
  configFile: /home/cwr/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /home/cwr/.local/share/containers/storage
  graphRootAllocated: 1023691194368
  graphRootUsed: 481339727872
  graphStatus:
    Build Version: Btrfs v5.18.1
    Library Version: "102"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  volumePath: /home/cwr/.local/share/containers/storage/volumes
version:
  APIVersion: 4.1.1
  Built: 1659559968
  BuiltTime: Wed Aug  3 22:52:48 2022
  GitCommit: f73d8f8875c2be7cd2049094c29aff90b1150241-dirty
  GoVersion: go1.19
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1

I wonder if the kind authors just never got the message that units are now put into app.slice by default, and delegation need to be also enabled in intermediate slices. And if they are e.g. on Fedora, that is done by default ...

rootless podman support is contributed by @AkihiroSuda, nominally kind is developed for rootful docker.

Akihiro contributed some fedora based CI for rootless docker and podman, which does happen to be fedora based.

This is the first we’ve encountered a rootless podman user that wasn’t on fedora, as you can tell. Most of our users use docker, which has mature support. Podman support is experimental (the tool should be printing a warning when you use podman) and is fundamentally limited by some of the significant differences where it is not drop in compatible, we have entirely separate code paths for podman behavior.

Huh, interesting, I thought podman would be more widespread 😅 I've been using it for years now and only on Arch Linux

No idea what exactly your problem is. But if something wants to use cgroups, it really should run in its own systemd unit and enable delegation by setting the appropriate options there. As you know, that is possible to do by writing the .service file or using systemd-run --user.

The kind process doesn’t touch cgroups. Our tool invokes docker or podman to spawn a container, inside the container we do touch cgroups.

Mh, then I guess I'll have to continue to run kind in its own scope.

cwrau avatar Aug 09 '22 10:08 cwrau

This is the first we’ve encountered a rootless podman user that wasn’t on fedora, as you can tell. Most of our users use docker, which has mature support. Podman support is experimental (the tool should be printing a warning when you use podman) and is fundamentally limited by some of the significant differences where it is not drop in compatible, we have entirely separate code paths for podman behavior.

I am also using kind on Arch/Garuda, and we have encountered each other before @BenTheElder. In fact, I remember raising an issue about this, and giving up; #2684. Making this issue a duplicate of mine. @maciekmm is also on Arch and uses rootless, and he just made issue a few days ago.

Podman is FOSS, and in the spirit of kind also being FOSS, it is my humble opinion that podman should have a little more attention. I completely understand that it requires a lot of time, and have endless respect for the work the kind team has done; this is my code of ethics, not yours (no harm if you don't want to do it). kind has been very helpful to me, again, thank you.

Please make kind distro and podman/docker agnostic. There aren't that many: Arch, Fedora, RedHat, and Debian (off the top of my head).

This is not helpful … podman attempts to be docker compatible and contains docker sub commands like podman info. This command is rich with host environment debug info.

In fairness, I think he was trying to be funny.

caniko avatar Aug 10 '22 09:08 caniko

  • Does it work with KDE or XFCE?
  • Does it work with Rootless Docker?
  • Can be workaround-ed with systemd-run ? If so, could you open a PR to update the docs?

AkihiroSuda avatar Aug 10 '22 14:08 AkihiroSuda

In fairness, I think he was trying to be funny.

Actually, I interpreted Docker version: literally and thought docker info was just a suggestion on how to get the version.

Even if I had docker I would have still just inserted the version there 😅

Does it work with KDE or XFCE?

What do you mean by that? I would say it's not coupled to anything DE related.

Does it work with Rootless Docker?

I will try this tomorrow 👍

Can be workaround-ed with systemd-run ? If so, could you open a PR to update the docs?

Yeah, you can wrap the kind call in systemd-run;

systemd-run --user --scope --property=Delegate=yes kind create cluster

If that's the recommended way to run kind with rootless podman, then yes, I can open a PR

cwrau avatar Aug 10 '22 14:08 cwrau

Actually, I interpreted Docker version: literally and thought docker info was just a suggestion on how to get the version.

We should clarify the template, this command provides necessary debug info.

Podman is FOSS, and in the spirit of kind also being FOSS, it is my humble opinion that podman should have a little more attention. I completely understand that it requires a lot of time, and have endless respect for the work the kind team has done; this is my code of ethics, not yours (no harm if you don't want to do it). kind has been very helpful to me, again, thank you.

Please make kind distro and podman/docker agnostic. There aren't that many: Arch, Fedora, RedHat, and Debian (off the top of my head).

This is not about ethics and podman receives a lot of attention.

There are not simply "Arch, Fedora, RedHat, and Debian", there are infinite linux distros and configurations, new ones are created every day. We cannot support all of them equally (and RHEL requires a license ...). We have limited time and resources to run them locally and in CI. So far that means podman and rootless are tested primarily in CI on Fedora, we already have a large CI matrix for this small project.

KIND is already supporting podman to a more than reasonable extent, at a relatively outsized cost. Unlike docker, podman does not provide a stable, mature interface. Both are FOSS.

It is mostly docker compatible except when it isn't, which is fine, we've already developed separate implementations to support podman and set up CI ... However podman also makes breaking changes against it's own behavior.

Off the top of my head: https://github.com/kubernetes-sigs/kind/pull/2257, https://github.com/kubernetes-sigs/kind/issues/2085#issuecomment-784804927, ...

Docker has made exactly one small breaking change for the duration of this project (#2046) even though it has been supported for far longer.

We support podman anyhow, even though the primary purpose of this project is to develop Kubernetes (see: https://kind.sigs.k8s.io/docs/contributing/project-scope/) and Kubernetes requires docker, not podman to develop because it leverages buildx for multi-arch and all those users therefore must have docker installed.

There are additional limitations to using podman (mainly around restart support) because podman simply does not handle these things due to difference in approach.

Please remember that @aojea and I are already lending our free time or cutting into work time to support this and we could instead be improving / fixing Kubernetes (which relates to our actual current day jobs) or shipping a new KIND release at impact to far more users.

I have recently gone way out of my way to prevent rootless from being broken in particular (https://github.com/kubernetes-sigs/kind/pull/2846, https://github.com/kubernetes/enhancements/issues/361#issuecomment-1172435157 and the less visible work from myself and others meeting to find a ship and last minute fix to Kubernetes) because I do care about our users, but my time is bounded.

I will review PRs to fix this, but it's simply not a priority for me to debug rootless podman x Arch ... Kubernetes only works fully on rootful and docker is a perfectly acceptable free and open source alternative to podman, I don't personally use Arch and I cannot run it on my employer-provided machines.

BenTheElder avatar Aug 10 '22 16:08 BenTheElder

Another consideration here: Projects like Kubernetes, podman, docker, runc, containerd, etc. also only run CI or develop for a limited set of environments, so we carry a higher cost to keep these things working together because the things we're integrating with are not developed or tested in these ways so as in https://github.com/kubernetes/enhancements/issues/361#issuecomment-1172435157 we have to turn around and proactively convince them to support these things and fix them to enable to support in KIND.

I'll go a step further and say I'm willing to write docs changes or code patches to fix these environments myself if we receive sufficient information about how to fix them, but we're not going to stretch our CI matrix even further or locally develop on additional environments. It's already a lot.

BenTheElder avatar Aug 10 '22 16:08 BenTheElder

Uh, but if podman is the one needing Delegation, then shouldn't podman have the corresponding configuration and documentation on how to get it up and running (which can then be directly linked by kind).

benzea avatar Aug 10 '22 21:08 benzea

Uh, but if podman is the one needing Delegation, then shouldn't podman have the corresponding configuration and documentation on how to get it up and running (which can then be directly linked by kind).

The documentation is already there: https://kind.sigs.k8s.io/docs/user/rootless/ https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md

It's comprehensive, and I'd go as far as saying that these two combined form a complete guide on how to run kind on rootless podman/docker.

On top of that kind already links to that document if it detects missing Delegate.

maciekmm avatar Aug 10 '22 21:08 maciekmm

https://github.com/kubernetes-sigs/kind/pull/2981 adds a hint for systemd-run --scope --user kind create cluster, thanks @VannTen

BenTheElder avatar Nov 08 '22 06:11 BenTheElder

https://github.com/kubernetes-sigs/kind/pull/3032 will clarify the bug template re: docker info / podman info.

BenTheElder avatar Dec 19 '22 07:12 BenTheElder

#2981 adds a hint for systemd-run --scope --user kind create cluster, thanks @VannTen

Took me a while to find this issue, I followed the hint and did all the cgroups v2 checks along with adding the /etc/systemd/system/[email protected]/delegate.conf file... none of it worked, sadly.

On my system (Ubuntu 22.04 LTS + Podman 4.3.1) that hint doesn't work ~, whereas @cwrau's method does not error out, though I can't seem to find any created clusters~ (see update):

$ systemd-run --scope --user kind create cluster
Running scope as unit: run-r39bac831e38c4a4fad72f425230b9030.scope
enabling experimental podman provider
ERROR: failed to create cluster: running kind with rootless provider requires setting systemd property "Delegate=yes", see https://kind.sigs.k8s.io/docs/user/rootless/

$ systemd-run --user --property=Delegate=yes kind create cluster
Running as unit: run-r7137d1a0db4f46a5a1c6d6fbcf7225eb.service

$ kind get clusters
enabling experimental podman provider
No kind clusters found.

$ systemd-run --scope --user kind get clusters
Running scope as unit: run-r923df78bf2bc4b2bae955892be078d3c.scope
enabling experimental podman provider
No kind clusters found.

Is there a corresponding issue in podman's issues that links to this?


Update: Upon further inspection, @cwrau's invocation fails exactly the same way, except the error gets dumped in journalctl -f --user instead of stderr...

an 12 23:35:13 razer-neon systemd[841]: Started /usr/local/bin/kind create cluster.
Jan 12 23:35:13 razer-neon kind[9015]: enabling experimental podman provider
Jan 12 23:35:13 razer-neon kind[9015]: ERROR: failed to create cluster: running kind with rootless provider requires setting systemd property "Delegate=yes", see https://kind.sigs.k8s.io/docs/user/rootless/
Jan 12 23:35:13 razer-neon systemd[841]: run-r85f05a467a5b4292b8af42fbeda81917.service: Main process exited, code=exited, status=1/FAILURE
Jan 12 23:35:13 razer-neon systemd[841]: run-r85f05a467a5b4292b8af42fbeda81917.service: Failed with result 'exit-code'.

deftdawg avatar Jan 13 '23 00:01 deftdawg

Update: Upon further inspection, @cwrau's invocation fails exactly the same way, except the error gets dumped in journalctl -f --user instead of stderr...

You can run it with systemd-run --user >--scope< --property=Delegate=yes kind create cluster to run it synchronously and with direct output, I adjusted my above comment.

I jut wanted to try this again to check if it's working on my end, but I was getting different errors;

λ sru --property=Delegate=yes --scope kind create cluster
Running scope as unit: run-r8d8a8ba153014351b0e1c5199d4f5edc.scope
enabling experimental podman provider
Creating cluster "kind" ...
ERROR: failed to create cluster: failed to ensure podman network: command "podman network create -d=bridge --ipv6 --subnet fc00:f853:ccd:e793::/64 kind" failed with error: exit status 125
Command Output: Error: could not find free subnet from subnet pools

I fixed that by adding {"base" = "11.0.0.0/24", "size" = 24} as an additional subnet_pool in my containers.conf;

[network]
default_subnet_pools = [
  {"base" = "11.0.0.0/24", "size" = 24},
  {"base" = "10.89.0.0/16", "size" = 24},
  {"base" = "10.90.0.0/15", "size" = 24},
  {"base" = "10.92.0.0/14", "size" = 24},
  {"base" = "10.96.0.0/11", "size" = 24},
  {"base" = "10.128.0.0/9", "size" = 24},
]

Then I was getting the following error;

λ sru --property=Delegate=yes --scope kind create cluster
Running scope as unit: run-rdce61f37b2fe4c978281deff3e2c6697.scope
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.25.3) 🖼
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: command "podman run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --tmpfs /tmp --tmpfs /run --volume bf13142d953a4c24f351bff1f96bbbd0e82381cc93edb6f49b475e8abc5da707:/var:suid,exec,dev --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --net kind --label io.x-k8s.kind.cluster=kind -e container=podman --volume /dev/mapper:/dev/mapper --device /dev/fuse --publish=127.0.0.1:35889:6443/tcp -e KUBECONFIG=/etc/kubernetes/admin.conf docker.io/kindest/node@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1" failed with error: exit status 126
Command Output: time="2023-01-13T12:03:15+01:00" level=warning msg="aardvark-dns binary not found, container dns will not be enabled"
Error: netavark: code: 3, msg: modprobe: ERROR: could not insert 'ip6_tables': Operation not permitted
ip6tables v1.8.8 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)
Perhaps ip6tables or your kernel needs to be upgraded.

Which I fixed by running sudo modprobe ip6_tables

After that it's working 😁

cwrau avatar Jan 13 '23 11:01 cwrau