amd64 images build under a Debian 12 Docker image end up with no BIOS bootloader due to `lsblk` malfunctioning
In chroot-script, a function is_grub_bios_compatible is used to determine whether or not the drive being installed to can have a grub BIOS bootloader installed to it or not. This works for GPT disks by running lsblk -pnlo NAME,PARTTYPE to find all partitions and partition types on the system, then searches the lsblk output for partitions on the device being installed to that have a partition UUID of 21686148-6449-6e6f-744e-656564454649. This works perfectly fine on physical and virtual machines, but it fails badly under Docker, since lsblk -pnlo NAME,PARTTYPE does this under Docker (at least when using an image based on the official Debian 12 image):
lsblk -pnlo NAME,PARTTYPE
/dev/loop1
/dev/loop2
/dev/loop3
/dev/loop4
/dev/loop5
/dev/loop6
/dev/loop7
/dev/loop8
/dev/loop9
/dev/loop10
/dev/loop11
/dev/loop12
/dev/loop13
/dev/loop14
/dev/zram0
/dev/nvme0n1
/dev/nvme1n1
/dev/nvme0n1p1
/dev/nvme0n1p2
/dev/nvme0n1p3
/dev/nvme0n1p4
/dev/nvme1n1p1
All of the partition type UUIDs are missing, so is_grub_bios_compatible assumes the disk does NOT support a BIOS bootloader, and thus installs an EFI-only bootloader.
Will add reproduction steps later, this is sort of a hurried brain dump for now.
It seems to me this will need a similar solution as debian bug #1108311
Steps to reproduce:
-
docker image pull debian:12 -
docker run --name debian-grml-test --interactive --tty --rm --privileged debian:12 /bin/bash -
lsblk -pnlo NAME,PARTTYPE
Expected result: Partition types should be shown next to some of the partition devices displayed
Actual result: Only partition devices are shown, no partition type UUIDs are shown.
-
apt update -
apt install dpkg-dev debhelper build-essential git -
apt install --no-install-recommends asciidoc docbook-xsl shunit2 xsltproc -
git clone https://github.com/grml/grml-debootstrap.git -
cd grml-debootstrap -
dpkg-buildpackage -i -us -uc -b -
cd .. -
apt install ./grml-debootstrap_0.121_all.deb -
grml-debootstrap --release bookworm --arch amd64 --target ./vm.img --force --vmfile --vmefi --password x - On the host, NOT in the container,
sudo docker cp debian-grml-test:/vm.img ./ - On the host,
sudo chown "$USER:$USER" ./vm.img - On the host,
qemu-system-x86_64 -drive file=./vm.img,format=raw,if=virtio -m 2G -smp 2 -enable-kvm
Expected result: VM boots.
Actual result: VM hangs at "Booting from Hard Disk..."
- On the host,
qemu-system-x86_64 -drive file=./vm.img,format=raw,if=virtio -m 2G -smp 2 -enable-kvm -bios /usr/share/ovmf/OVMF.fd
Expected and actual result: VM boots.
This is using an Ubuntu 24.04 LTS host for Docker.
Worthy of note, blkid allows reading the partition type UUIDs, so I have no idea what lsblk is doing wrong that it can't figure out docker. It might be possible to use blkid rather than lsblk to work around this, but I've had race condition issues doing that, so I'm not sure if that will be better in the long run or not.
It seems to me this will need a similar solution as debian bug #1108311
hmm... not sure if that's related or not.
hmm... not sure if that's related or not.
Yes, because what you are seeing here is caused by reusing /dev from the outside. What we really need is a grml-debootstrap-managed /dev.
My reproduction steps are actually bad because they fail to take into account that Debian Trixie must be used in order for the partition UUIDs to be able to be changed using parted, you'd have to format a raw image yourself and change the partition UUIDs using fdisk or similar to truly reproduce this, but you get the idea, and the lsblk example shows the issue.
Yes, because what you are seeing here is caused by reusing /dev from the outside. What we really need is a grml-debootstrap-managed /dev.
OK, I guess I don't really understand how things work in this context then. mount | grep /dev reveals that /dev in the container is just a tmpfs, so I don't know what Docker is doing there but it would make sense that it would cause issues. Do you have suggestions on how to implement what you're mentioning? I could give it a shot once I understand the idea.
Do you have suggestions on how to implement what you're mentioning? I could give it a shot once I understand the idea.
So what I expect we need to do for the other bug:
- stop mounting the existing /dev, /dev/pts, /run/udev things into the chroot
- mount a new tmpfs as $chroot/dev
- mknod a few basic things so installing packages works (null, zero, console, ...)
- after packages are installed,
udevadm settle ...or whatever is the right command to have udev populate the $chroot/dev - only then run bootloader install etc
That makes sense. I'll give that a shot.
Running udevadm settle in a chroot resulted in a message Running in chroot, ignoring request. I don't think going through udev is going to work, it doesn't seem to be designed to create a second device tree.
What may work though is "copying" the /dev directory from the host to the chroot, by determining the major and minor device numbers for each device under /dev and mknod'ing new devices with those same numbers in the chroot. That would only take a snapshot of the host's device state, but it would be better than nothing. If we need it to stay up-to-date during the build, we can probably use something like inotify or fanotify.
Just to make sure it was actually possible, I wrote a proof-of-concept script that can make a "copy" of /dev. It seems to work well enough, the final implementation will have to work around files that are already present in /dev most likely, but that shouldn't be a problem. The script:
#!/bin/bash
# dev-clone.sh - recursively copies a device directory tree such as /dev
main() {
local src_dir tgt_dir src_entry_list src_dir_list src_dir_item \
tgt_dir_item src_entry tgt_entry file_type_char dev_id dev_permissions \
link_target
src_dir="${1:-}"
tgt_dir="${2:-}"
if [ -z "${src_dir}" ] || [ -z "${tgt_dir}" ]; then
printf '%s\n' 'ERROR: Source and destination must both be specified.'
exit 1
fi
readarray -t src_dir_list < <(find "${src_dir}" -type d)
readarray -t src_entry_list < <(find "${src_dir}")
for src_dir_item in "${src_dir_list[@]}"; do
tgt_dir_item="${src_dir_item/"${src_dir}"/"${tgt_dir}"}"
mkdir -p "${tgt_dir_item}" || {
printf '%s\n' "ERROR: Failed to create directory ${tgt_dir_item}!"
exit 1
}
done
for src_entry in "${src_entry_list[@]}"; do
[ -d "${src_entry}" ] && continue
tgt_entry="${src_entry/"${src_dir}"/"${tgt_dir}"}"
file_type_char="$(head -c1 < <(stat --format='%A' "${src_entry}"))"
case "${file_type_char}" in
-) # normal file
cp -a "${src_entry}" "${tgt_entry}"
;;
l) # symbolic link
link_target="$(readlink "${src_entry}")"
ln -s "${link_target}" "${tgt_entry}"
;;
c|b) # character or block device
dev_id="$(stat --format='%Hr %Lr' "${src_entry}")"
dev_permissions="$(stat --format='%a' "${src_entry}")"
mknod -m "${dev_permissions}" "${tgt_entry}" "${file_type_char}" \
${dev_id}
;;
esac
done
}
main "$@"
I'll probably be integrating this into grml-debootstrap, but in case someone else sees this and needs it, consider it licensed as GPLv2 or later since that's the license of grml-debootstrap.
@zeha Do you see any problems using this approach?
Yes, I assume this will not work for VM images, as the by-uuid symlinks etc will not be there.
Good point. I guess an inotify-based mechanism or similar will be needed then... either that or else grml-debootstrap will need to explicitly re-sync the cloned dev filesystem at certain critical points, which might be less race-prone.
Cloning /dev will not fix the docker issue. I just tested it:
- Enter a Debian 12 docker environment:
docker run --name debian-grml-test --interactive --tty --rm --privileged debian:12 /bin/bash - Install
mmdebstrapandvim - Copy the
dev-clone.shscript above and paste it into a script/dev-clone.shinside the container, using Vim to edit -
chmod +x /dev-clone.sh - Spin up a Debian 12 chroot within the container:
mmdebstrap bookworm test-chroot - Delete
test-chroot/dev - Re-create it using the clone script:
/dev-clone.sh /dev test-chroot/dev - Recreate
/dev/fdwithin the chroot because somehow the script missed it:ln -s /proc/self/fd test-chroot/dev/fd - Bind mount proc and sys:
mount --bind /proc test-chroot/proc; mount --bind /sys test-chroot/sys - Enter the chroot:
chroot test-chroot - Run
lsblk -pnlo NAME,PARTTYPE - No partition type UUIDs are displayed.
I think Docker itself is at fault here. Other people seem to have had problems using Docker and lsblk together as well: https://forums.docker.com/t/why-does-lsblk-report-partition-as-null-from-within-a-docker-container/139393
Reported as a bug in Docker: https://github.com/moby/moby/issues/50304
I'm somewhat sure lsblk needs the udev database to read the UUIDs from. Should probably check the lsblk source.
the following works for me (without docker, haven't tried). copying the block devices is necessary probably because i'm doing something else wrong...
#!/bin/bash
set -ex
R=$PWD/chroot
rm -rf chroot
mmdebstrap --mode=unshare --include=udev,systemd-sysv,grub-cloud-arm64,linux-image-arm64 trixie $R
for x in /dev/block/*; do realdev=$(readlink -f $x) ; echo "creating $realdev"; cp -a "$realdev" $R/dev/ ; done
cat >$R/init <<EOT
#!/bin/bash
set -ex
mkdir /dev/pts
mount -t devpts devpts /dev/pts
mount -t tmpfs run /run
mount -t sysfs sysfs /sys
mount
ls -la /dev
mknod /dev/console c 5 1
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mknod /dev/full c 1 7
ln -s /proc/self/fd/ /dev/fd
echo starting udevd
/usr/lib/systemd/systemd-udevd &
sleep 2
echo trigger
udevadm trigger --type=all --action=add --prioritized-subsystem=module,block,tty -w
echo done
ls -la /dev
ls -la /dev/block
mkdir /boot/grub
mkdir /boot/efi
#grub-install
update-grub
grep root= /boot/grub/grub.cfg
set +ex
exec /bin/bash -i
EOT
chmod a+rx $R/init
unshare --pid --ipc --mount --mount-proc -R $R --fork --kill-child /init
umount --recursive $R/sys $R/proc $R/dev $R/run
...
+ grep root= /boot/grub/grub.cfg
linux /boot/vmlinuz-6.12.33+deb13-arm64 root=UUID=fdba34a7-77cf-4724-8e90-d9069fa6ce9e ro console=tty0 console=ttyAMA0
linux /boot/vmlinuz-6.12.33+deb13-arm64 root=UUID=fdba34a7-77cf-4724-8e90-d9069fa6ce9e ro console=tty0 console=ttyAMA0
linux /boot/vmlinuz-6.12.33+deb13-arm64 root=UUID=fdba34a7-77cf-4724-8e90-d9069fa6ce9e ro single console=tty0 console=ttyAMA0
my current copy of this script is https://salsa.debian.org/zeha/chr/-/blob/main/chr.sh?ref_type=heads
copying the block devices is necessary probably because i'm doing something else wrong...
after checking how this works in a normal install, i think this is the right approach.
It looks like you're right about udev being needed for this: https://github.com/moby/moby/issues/50304#issuecomment-3029716931 Furthermore, it appears that udev can be used within Docker, and when it is, lsblk behaves itself. If this is the case, this may not be a bug in grml-debootstrap at all, but instead a bug in my Docker environment. I'll do some more testing, if that turns out to be the case we can close this and worry about /dev cloning for the other bug.