coreos-installer icon indicating copy to clipboard operation
coreos-installer copied to clipboard

blockdev: use 'blkid' for reading device's UUID

Open nikita-dubrovskii opened this issue 2 years ago • 6 comments

firstboot of RHCOS on IBM zKVM from time to time fails during "File System Check". This happens, because systemd unit has an old filesystem's UUID from pristine qcow2 image, not the regenerated one:

coreos-boot-edit: + lsblk -o NAME,LABEL,UUID --paths --pairs /dev/disk/by-label/boot
coreos-boot-edit: NAME="/dev/mapper/crypt_bootfs" LABEL="boot" UUID="96d15588-3596-4b3c-adca-a2ff7279ea63"
coreos-boot-edit: + blkid /dev/disk/by-label/boot
coreos-boot-edit: /dev/disk/by-label/boot: LABEL="boot" UUID="eee55c4f-c2df-47e9-a284-992e9e122a97" BLOCK_SIZE="1024" TYPE="ext4"
coreos-boot-edit: + rdcore bind-boot /sysroot /mnt/boot_partition
.....
coreos-boot-mount-generator: ++ cat /run/coreos/bootfs_uuid
coreos-boot-mount-generator: + bootdev=/dev/disk/by-uuid/96d15588-3596-4b3c-adca-a2ff7279ea63

nikita-dubrovskii avatar Jul 12 '22 12:07 nikita-dubrovskii

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

jlebon avatar Jul 12 '22 19:07 jlebon

Hmm, so the bootfs UUID reported by lsblk is stale? Do we know why?

i guess that issue is somewhere between old kernel of RHEL and udev on zKVM. FCOS works just fine. Maybe i'm wrong.

nikita-dubrovskii avatar Jul 13 '22 06:07 nikita-dubrovskii

Hmm. I think this may be that lsblk uses the kernel's cached view of things by reading from /sys, but blkid opens the block device directly.

(Comparing e.g. strace -f lsblk /dev/vda vs strace -f blkid /dev/vda in a cosa run shell)

This to mean signals that the real problem is likely that we need to synchronously wait for a partprobe.

cgwalters avatar Jul 13 '22 14:07 cgwalters

Do we still need this now that we're using an Ignition config for the reprovisioning in https://github.com/coreos/fedora-coreos-config/pull/1819?

jlebon avatar Jul 14 '22 13:07 jlebon

Do we still need this now that we're using an Ignition config for the reprovisioning in coreos/fedora-coreos-config#1819?

i'd prefer to have this. i wasn't able to test ignition+luks on RHCOS, because it again switched to an old kernel (or haven't picked up a fixed one): https://bugzilla.redhat.com/show_bug.cgi?id=2075085 . switching to dev/vda instead of coreos-boot-disks doesn't help much. so i'm still debugging why /dev/disk/by-*/ are partially empty after ignition

nikita-dubrovskii avatar Jul 14 '22 14:07 nikita-dubrovskii

Right, my concern with this is that this feels like it's working around what could possibly be a deeper issue. We're fixing it for rdcore but other code (present and future) may still be using the wrong information. If Secure Execution is triggering this, let's try to find out why that is and fix it.

jlebon avatar Aug 10 '22 20:08 jlebon