coreos-assembler icon indicating copy to clipboard operation
coreos-assembler copied to clipboard

fatal: Missing /dev/kvm on Debian testing with rootless podman

Open jmpolom opened this issue 3 years ago • 11 comments

Bug Report

Environment

What operating system is being used to run coreos-assembler? Debian testing

What operating system is being assembled? Fedora CoreOS

Is coreos-assembler running in Podman or Docker? Podman 3.3.1

If Podman, is coreos-assembler running privileged or unprivileged? rootless; privileged

Expected Behavior

cosa should work with rootless podman when /dev/kvm/ is present on the host since it is advertised to do so.

Actual Behavior

cosa reports that /dev/kvm is missing because it doesn't get mounted into the container. This appears to be due to a permissions issue when podman mounts a device into a container:

--device=host-device[:container-device][:permissions]

Add a host device to the container. Optional permissions parameter can be used to specify device permissions, it is combination of r for read, w for write, and m for mknod(2).

Example: --device=/dev/sdc:/dev/xvdc:rwm.

Note: if host_device is a symbolic link then it will be resolved first. The container will only store the major and minor numbers of the host device.

Note: if the user only has access rights via a group, accessing the device from inside a rootless container will fail. Use the --group-add keep-groups flag to pass the user’s supplementary group access into the container.

Podman may load kernel modules required for using the specified device. The devices that Podman will load modules when necessary are: /dev/fuse.

However, adding --group-add keep-groups does not resolve this issue and cosa continues to complain that /dev/kvm is missing inside the container (because it is).

Reproduction Steps

  1. Obtain a system with Debian testing and podman installed. Ensure virtualization is enabled and /dev/kvm is present on the booted Debian testing system.
  2. Add recommended bash alias; modified with --group-add keep-groups per podman docs:
cosa() {
   env | grep COREOS_ASSEMBLER
   local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
   if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
       local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
       local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
       if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*7)) ]] ; then
         echo -e "\e[0;33m----" >&2
         echo "The COSA container image is more that a week old and likely outdated." >&2
         echo "You should pull the latest version with:" >&2
         echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
         echo -e "----\e[0m" >&2
         sleep 10
       fi
   fi
   set -x
   podman run --rm -ti --security-opt label=disable --privileged                                    \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536                          \
              -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse --group-add keep-groups          \
              --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa                                         \
              ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro}   \
              ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro}  \
              ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS}                                            \
              ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
   rc=$?; set +x; return $rc
}
  1. Do cosa build
  2. Observe the following output:
+ podman run --rm -ti --security-opt label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 -v /home/jon/fedora-coreos-config:/srv/ --device /dev/kvm --device /dev/fuse --group-add keep-groups --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa quay.io/coreos-assembler/coreos-assembler:latest build
fatal: Missing /dev/kvm
+ rc=1
+ set +x

cc: @storrgie @jkl92

jmpolom avatar Oct 19 '21 00:10 jmpolom

Is /dev/kvm o+rw on Debian?

$ ls -alh /dev/kvm
crw-rw-rw-. 1 root kvm 10, 232 Oct 18 19:52 /dev/kvm

travier avatar Oct 19 '21 12:10 travier

Looks like others don't have permission to /dev/kvm on Debian:

jon@arc:~$ ls -lh /dev/kvm
crw-rw---- 1 jon kvm 10, 232 Oct 16 23:16 /dev/kvm

I suspect this is the cause as Fedora gives others rw- permission to /dev/kvm and cosa seemts to work without issue on Fedora. Most curious though: why would this occur given that I changed ownership of the device to the user (jon) that I was executing cosa as? Default ownership is obviously root, but I thought chown might be my savior.

jmpolom avatar Oct 19 '21 18:10 jmpolom

Confirming that the following allows cosa to run on Debian:

jon@arc:~$ ls -lh /dev/kvm
crw-rw-rw- 1 root kvm 10, 232 Oct 19 18:21 /dev/kvm

Why are group permissions and outright device ownership insufficient for podman to be able to map in the kvm device? Is this an upstream podman issue? Would like some thoughts on this since it is puzzling to me that neither ownership nor group permissions worked.

jmpolom avatar Oct 19 '21 22:10 jmpolom

When running rootless container with podman, the root user inside the container will be mapped to the current user outside the container and non-root users inside will be mapped according to /etc/subuid and /etc/subgid. Here QEMU will run as non-root inside a rootless container and will thus not be using the user UID/GID.

I don't remember the details about why Debian changed the mode on /dev/kvm. Do you think we can close this one?

travier avatar Oct 20 '21 18:10 travier

When running rootless container with podman, the root user inside the container will be mapped to the current user outside the container and non-root users inside will be mapped according to /etc/subuid and /etc/subgid. Here QEMU will run as non-root inside a rootless container and will thus not be using the user UID/GID.

Is that statement accurate when doing --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 in the podman invocation? Please see my initial post for context. My understanding is that maps UID 1000 in the container to UID 1000 on the host which is the UID of the user I ran cosa as rootless-ly. What user/UID is qemu supposed to run as inside cosa? Is this documented anywhere?

Once I was able to get this working, I noticed some odd behavior when invoking cosa (via bash alias with podman as I documented in my initial post). Specifically, I was prompted to enter a sudo password for the 'builder' user in the container by whatever was attempting to run inside cosa. I do not understand this, and it certainly was not mentioned in any of the documentation I had read before attempting to use cosa.

In the end I ended up adding a --user=root flag to the podman command in the bash alias and this ended sudo prompting for the builder user password when running cosa but is currently resulting in on disk files being owned by 100000:100000 due to the UID/GID mapping that was applied. Obviously I can remove this mapping but again, I basically took information from the available documentation and implemented things exactly as described assuming that was the best path to get cosa functioning on any platform.

jmpolom avatar Oct 20 '21 18:10 jmpolom

Is that statement accurate when doing --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 in the podman invocation? Please see my initial post for context. My understanding is that maps UID 1000 in the container to UID 1000 on the host which is the UID of the user I ran cosa as rootless-ly.

Oh, true, I had forgotten about that. I will have to take another look then.

travier avatar Oct 21 '21 17:10 travier

@travier any progress on this? I have a workaround but it is still puzzling to me why it didn't work when everything seemed to agree as far as permissions/users went. Any pointers for additional root causing?

jmpolom avatar Oct 28 '21 23:10 jmpolom

Update: the following function works on Silverblue 34.2021.1027.1 without any issue. No prompting for a sudo password when coreos-assembler runs or any need to tweak permissions to /dev/kvm. I understand the issue with needing other +rw permissions to some extent but I really am perplexed by the prompt for a sudo password on Debian.

There's a slight difference in tested podman versions (3.3.1 on Debian) but I can't believe that would cause this issue.

Definition:

cosa () {
    env | grep COREOS_ASSEMBLER
    local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
    if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
        local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
        local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
        if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*30)) ]] ; then
            echo -e "\e[0;33m----" >&2
            echo "The COSA container image is more that a week old and likely outdated." >&2
            echo "You should pull the latest version with:" >&2
            echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
            echo -e "----\e[0m" >&2
            sleep 10
        fi
    fi
    set -x
    podman run --rm -ti --security-opt label=disable --privileged                                    \
               --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536                          \
               -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse                                  \
               --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa                                         \
               ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro}   \
               ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro}  \
               ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS}                                            \
               ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
    rc=$?; set +x; return $rc
}

Output from podman info on Silverblue:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 32
  distribution:
    distribution: fedora
    variant: silverblue
    version: "34"
  eventLogger: journald
  hostname: beast
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.13-200.fc34.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 25133887488
  memTotal: 67361275904
  ociRuntime:
    name: crun
    package: crun-1.2-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.2
      commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc34.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 4h 26m 48.76s (Approximately 0.17 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/jon/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/jon/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 4
  runRoot: /run/user/1000/containers
  volumePath: /var/home/jon/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.0
  Built: 1633030821
  BuiltTime: Thu Sep 30 15:40:21 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.0

jmpolom avatar Nov 03 '21 03:11 jmpolom

Hello everyone, sorry to answer to this old issue but I kind of found a solution being rootless :

Verify that your user can use KVM acceleration :

$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

Verify that your user is in the kvm group :

id -nG $USER | grep kvm

Then modify the command to run the cosa container by adding a volume bound to the /dev/kvm device on your host :

   podman run --rm -ti --security-opt label=disable --privileged  \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536  \
              -v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse \
              --tmpfs /tmp -v /var/tmp:/var/tmp  --name cosa \
              -v /dev/kvm:/dev/kvm \
              ${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro} \
              ${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro} \
              ${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS} \
              ${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} shell

This way you might be able to run cosa fetch && cosa build inside the container.

Can everyone tell me if this works and if it is the proper way to do it ? If it is validated, I think we should fix the documentation.

IceManGreen avatar Jul 19 '22 09:07 IceManGreen

From podman-run(1):

Note:  if  the user only has access rights via a group, accessing the device from inside a rootless
container will fail. Use the --group-add keep-groups flag to pass the user's supplementary group
access into  the container.

Can you give --group-add keep-groups a try?

travier avatar Jul 19 '22 10:07 travier

Yes I tried the option --group-add keep-groups but it didn't change anything in my case so I encountered the same issue. I checked inside the container and no group from the host's user were kept :

[coreos-assembler]$ id
uid=1000(builder) gid=1000(builder) groups=1000(builder),65534(nobody)

I think it is kind of related to everyone's podman usage or environment, and not to coreos-assembler directly, as mentioned in this issue from podman : https://github.com/containers/podman/issues/10166

I checked my CRI with podman info and I have crun as expected :

$ podman info | yq eval '.host.ociRuntime' -

name: crun
package: 'crun: /usr/bin/crun'
path: /usr/bin/crun
version: |-
  crun version UNKNOWN
  commit: ea1fe3938eefa14eb707f1d22adff4db670645d6
  spec: 1.0.0
  +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YA

so --group-add keep-groups should work. So I am a little confused right now.

IceManGreen avatar Jul 19 '22 12:07 IceManGreen

Podman on Ubuntu/Debian has no maintainer, so the only version available is an older podman 3.4.2. It turns out there's a bug in podman 3.4.2 with how --device works; it won't check group memberships on devices when used in rootless containers. Instead it will silently fail to add a device to the container if the device is not owned by the user directly and the device doesn't have o+rw permissions (which is the case for /dev/kvm on Debian/Ubuntu).

The solution is that when using the affected versions of podman, you need to use -v /dev/kvm:/dev/kvm as well as using the --group-add keep-groups. Alternatively you can find some other way to install podman on your system that provides an updated podman version where this bug is fixed (e.g. nixpkgs from Nix).

I personally just created my cosa wrapper alias to use -v instead of --device for mounting all the devices and avoid the problem entirely.

mtalexan avatar Jan 24 '23 17:01 mtalexan

Thanks for the debug. I'd recommend using newer podman release via the packages listed in https://podman.io/getting-started/installation.

Will close this issue as there is not much to fix on our side here.

travier avatar Jan 27 '23 10:01 travier