coreos-assembler
coreos-assembler copied to clipboard
fatal: Missing /dev/kvm on Debian testing with rootless podman
Bug Report
Environment
What operating system is being used to run coreos-assembler? Debian testing
What operating system is being assembled? Fedora CoreOS
Is coreos-assembler running in Podman or Docker? Podman 3.3.1
If Podman, is coreos-assembler running privileged or unprivileged? rootless; privileged
Expected Behavior
cosa should work with rootless podman when /dev/kvm/
is present on the host since it is advertised to do so.
Actual Behavior
cosa reports that /dev/kvm
is missing because it doesn't get mounted into the container. This appears to be due to a permissions issue when podman mounts a device into a container:
--device=host-device[:container-device][:permissions]
Add a host device to the container. Optional permissions parameter can be used to specify device permissions, it is combination of r for read, w for write, and m for mknod(2).
Example: --device=/dev/sdc:/dev/xvdc:rwm.
Note: if host_device is a symbolic link then it will be resolved first. The container will only store the major and minor numbers of the host device.
Note: if the user only has access rights via a group, accessing the device from inside a rootless container will fail. Use the --group-add keep-groups flag to pass the user’s supplementary group access into the container.
Podman may load kernel modules required for using the specified device. The devices that Podman will load modules when necessary are: /dev/fuse.
However, adding --group-add keep-groups
does not resolve this issue and cosa continues to complain that /dev/kvm
is missing inside the container (because it is).
Reproduction Steps
- Obtain a system with Debian testing and podman installed. Ensure virtualization is enabled and
/dev/kvm
is present on the booted Debian testing system. - Add recommended bash alias; modified with
--group-add keep-groups
per podman docs:
cosa() {
env | grep COREOS_ASSEMBLER
local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*7)) ]] ; then
echo -e "\e[0;33m----" >&2
echo "The COSA container image is more that a week old and likely outdated." >&2
echo "You should pull the latest version with:" >&2
echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
echo -e "----\e[0m" >&2
sleep 10
fi
fi
set -x
podman run --rm -ti --security-opt label=disable --privileged \
--uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 \
-v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse --group-add keep-groups \
--tmpfs /tmp -v /var/tmp:/var/tmp --name cosa \
${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro} \
${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro} \
${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS} \
${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
rc=$?; set +x; return $rc
}
- Do
cosa build
- Observe the following output:
+ podman run --rm -ti --security-opt label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 -v /home/jon/fedora-coreos-config:/srv/ --device /dev/kvm --device /dev/fuse --group-add keep-groups --tmpfs /tmp -v /var/tmp:/var/tmp --name cosa quay.io/coreos-assembler/coreos-assembler:latest build
fatal: Missing /dev/kvm
+ rc=1
+ set +x
cc: @storrgie @jkl92
Is /dev/kvm
o+rw on Debian?
$ ls -alh /dev/kvm
crw-rw-rw-. 1 root kvm 10, 232 Oct 18 19:52 /dev/kvm
Looks like others don't have permission to /dev/kvm
on Debian:
jon@arc:~$ ls -lh /dev/kvm
crw-rw---- 1 jon kvm 10, 232 Oct 16 23:16 /dev/kvm
I suspect this is the cause as Fedora gives others rw- permission to /dev/kvm
and cosa seemts to work without issue on Fedora. Most curious though: why would this occur given that I changed ownership of the device to the user (jon) that I was executing cosa as? Default ownership is obviously root, but I thought chown
might be my savior.
Confirming that the following allows cosa to run on Debian:
jon@arc:~$ ls -lh /dev/kvm
crw-rw-rw- 1 root kvm 10, 232 Oct 19 18:21 /dev/kvm
Why are group permissions and outright device ownership insufficient for podman to be able to map in the kvm device? Is this an upstream podman issue? Would like some thoughts on this since it is puzzling to me that neither ownership nor group permissions worked.
When running rootless container with podman, the root user inside the container will be mapped to the current user outside the container and non-root users inside will be mapped according to /etc/subuid
and /etc/subgid
. Here QEMU will run as non-root inside a rootless container and will thus not be using the user UID/GID.
I don't remember the details about why Debian changed the mode on /dev/kvm
. Do you think we can close this one?
When running rootless container with podman, the root user inside the container will be mapped to the current user outside the container and non-root users inside will be mapped according to /etc/subuid and /etc/subgid. Here QEMU will run as non-root inside a rootless container and will thus not be using the user UID/GID.
Is that statement accurate when doing --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536
in the podman invocation? Please see my initial post for context. My understanding is that maps UID 1000 in the container to UID 1000 on the host which is the UID of the user I ran cosa as rootless-ly. What user/UID is qemu supposed to run as inside cosa? Is this documented anywhere?
Once I was able to get this working, I noticed some odd behavior when invoking cosa (via bash alias with podman as I documented in my initial post). Specifically, I was prompted to enter a sudo password for the 'builder' user in the container by whatever was attempting to run inside cosa. I do not understand this, and it certainly was not mentioned in any of the documentation I had read before attempting to use cosa.
In the end I ended up adding a --user=root
flag to the podman command in the bash alias and this ended sudo prompting for the builder user password when running cosa but is currently resulting in on disk files being owned by 100000:100000 due to the UID/GID mapping that was applied. Obviously I can remove this mapping but again, I basically took information from the available documentation and implemented things exactly as described assuming that was the best path to get cosa functioning on any platform.
Is that statement accurate when doing
--uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536
in the podman invocation? Please see my initial post for context. My understanding is that maps UID 1000 in the container to UID 1000 on the host which is the UID of the user I ran cosa as rootless-ly.
Oh, true, I had forgotten about that. I will have to take another look then.
@travier any progress on this? I have a workaround but it is still puzzling to me why it didn't work when everything seemed to agree as far as permissions/users went. Any pointers for additional root causing?
Update: the following function works on Silverblue 34.2021.1027.1 without any issue. No prompting for a sudo password when coreos-assembler runs or any need to tweak permissions to /dev/kvm
. I understand the issue with needing other +rw permissions to some extent but I really am perplexed by the prompt for a sudo password on Debian.
There's a slight difference in tested podman versions (3.3.1 on Debian) but I can't believe that would cause this issue.
Definition:
cosa () {
env | grep COREOS_ASSEMBLER
local -r COREOS_ASSEMBLER_CONTAINER_LATEST="quay.io/coreos-assembler/coreos-assembler:latest"
if [[ -z ${COREOS_ASSEMBLER_CONTAINER} ]] && $(podman image exists ${COREOS_ASSEMBLER_CONTAINER_LATEST}); then
local -r cosa_build_date_str="$(podman inspect -f "{{.Created}}" ${COREOS_ASSEMBLER_CONTAINER_LATEST} | awk '{print $1}')"
local -r cosa_build_date="$(date -d ${cosa_build_date_str} +%s)"
if [[ $(date +%s) -ge $((cosa_build_date + 60*60*24*30)) ]] ; then
echo -e "\e[0;33m----" >&2
echo "The COSA container image is more that a week old and likely outdated." >&2
echo "You should pull the latest version with:" >&2
echo "podman pull ${COREOS_ASSEMBLER_CONTAINER_LATEST}" >&2
echo -e "----\e[0m" >&2
sleep 10
fi
fi
set -x
podman run --rm -ti --security-opt label=disable --privileged \
--uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 \
-v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse \
--tmpfs /tmp -v /var/tmp:/var/tmp --name cosa \
${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro} \
${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro} \
${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS} \
${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} "$@"
rc=$?; set +x; return $rc
}
Output from podman info
on Silverblue:
host:
arch: amd64
buildahVersion: 1.23.1
cgroupControllers: []
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.30-2.fc34.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.30, commit: '
cpus: 32
distribution:
distribution: fedora
variant: silverblue
version: "34"
eventLogger: journald
hostname: beast
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.14.13-200.fc34.x86_64
linkmode: dynamic
logDriver: k8s-file
memFree: 25133887488
memTotal: 67361275904
ociRuntime:
name: crun
package: crun-1.2-1.fc34.x86_64
path: /usr/bin/crun
version: |-
crun version 1.2
commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.12-2.fc34.x86_64
version: |-
slirp4netns version 1.1.12
commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.0
swapFree: 8589930496
swapTotal: 8589930496
uptime: 4h 26m 48.76s (Approximately 0.17 days)
plugins:
log:
- k8s-file
- none
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/home/jon/.config/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 1
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/jon/.local/share/containers/storage
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 4
runRoot: /run/user/1000/containers
volumePath: /var/home/jon/.local/share/containers/storage/volumes
version:
APIVersion: 3.4.0
Built: 1633030821
BuiltTime: Thu Sep 30 15:40:21 2021
GitCommit: ""
GoVersion: go1.16.8
OsArch: linux/amd64
Version: 3.4.0
Hello everyone, sorry to answer to this old issue but I kind of found a solution being rootless :
Verify that your user can use KVM acceleration :
$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
Verify that your user is in the kvm
group :
id -nG $USER | grep kvm
Then modify the command to run the cosa container by adding a volume bound to the /dev/kvm
device on your host :
podman run --rm -ti --security-opt label=disable --privileged \
--uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 \
-v ${PWD}:/srv/ --device /dev/kvm --device /dev/fuse \
--tmpfs /tmp -v /var/tmp:/var/tmp --name cosa \
-v /dev/kvm:/dev/kvm \
${COREOS_ASSEMBLER_CONFIG_GIT:+-v $COREOS_ASSEMBLER_CONFIG_GIT:/srv/src/config/:ro} \
${COREOS_ASSEMBLER_GIT:+-v $COREOS_ASSEMBLER_GIT/src/:/usr/lib/coreos-assembler/:ro} \
${COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS} \
${COREOS_ASSEMBLER_CONTAINER:-$COREOS_ASSEMBLER_CONTAINER_LATEST} shell
This way you might be able to run cosa fetch && cosa build
inside the container.
Can everyone tell me if this works and if it is the proper way to do it ? If it is validated, I think we should fix the documentation.
From podman-run(1)
:
Note: if the user only has access rights via a group, accessing the device from inside a rootless
container will fail. Use the --group-add keep-groups flag to pass the user's supplementary group
access into the container.
Can you give --group-add keep-groups
a try?
Yes I tried the option --group-add keep-groups
but it didn't change anything in my case so I encountered the same issue.
I checked inside the container and no group from the host's user were kept :
[coreos-assembler]$ id
uid=1000(builder) gid=1000(builder) groups=1000(builder),65534(nobody)
I think it is kind of related to everyone's podman usage or environment, and not to coreos-assembler directly, as mentioned in this issue from podman : https://github.com/containers/podman/issues/10166
I checked my CRI with podman info
and I have crun
as expected :
$ podman info | yq eval '.host.ociRuntime' -
name: crun
package: 'crun: /usr/bin/crun'
path: /usr/bin/crun
version: |-
crun version UNKNOWN
commit: ea1fe3938eefa14eb707f1d22adff4db670645d6
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YA
so --group-add keep-groups
should work. So I am a little confused right now.
Podman on Ubuntu/Debian has no maintainer, so the only version available is an older podman 3.4.2. It turns out there's a bug in podman 3.4.2 with how --device
works; it won't check group memberships on devices when used in rootless containers. Instead it will silently fail to add a device to the container if the device is not owned by the user directly and the device doesn't have o+rw
permissions (which is the case for /dev/kvm
on Debian/Ubuntu).
The solution is that when using the affected versions of podman, you need to use -v /dev/kvm:/dev/kvm
as well as using the --group-add keep-groups
. Alternatively you can find some other way to install podman on your system that provides an updated podman version where this bug is fixed (e.g. nixpkgs from Nix).
I personally just created my cosa
wrapper alias to use -v
instead of --device
for mounting all the devices and avoid the problem entirely.
Thanks for the debug. I'd recommend using newer podman release via the packages listed in https://podman.io/getting-started/installation.
Will close this issue as there is not much to fix on our side here.