bugs
bugs copied to clipboard
dm-verity: How to fix incorrect hash?
Issue Report
Bug
Container Linux Version
Unfortunately, there are no instructions on how to extract that from the CoreOS rescue system if USR-A/B cannot be mounted. /etc/os-release
from the rescue system shows the dracut details.
Environment
Bare metal server from Hetzner.
Expected Behavior
The system should boot.
Actual Behavior
The system does not boot.
Reproduction Steps
A fault disk was replaced, and the following command was used to copy the first partitions to the new disk.
# Copy boot partitions
dd if=/dev/nvme0n1 of=/dev/nvme1n1 bs=512 count=270335
# Update partition types …
sfdisk --part-type /dev/nvme1n1 1 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
sfdisk --part-type /dev/nvme1n1 2 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
sfdisk --part-type /dev/nvme1n1 6 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
The layout looks like below, so the values for dd
should be correct.
$ cgpt show /dev/nvme0n1
start size part contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
4096 262144 1 Label: "EFI-SYSTEM"
Type: EFI System Partition
UUID: 36757B18-F990-4FD6-847C-CF8FDC4F97F2
266240 4096 2 Label: "BIOS-BOOT"
Type: 21686148-6449-6E6F-744E-656564454649
UUID: 679BD591-1D9E-4896-881B-FDDEA79EA6C6
270336 2097152 3 Label: "USR-A"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: 7130C94A-213A-4E5A-8E26-6CCE9662F132
2367488 2097152 4 Label: "USR-B"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A57C
4464640 262144 6 Label: "OEM"
Type: 0FC63DAF-8483-4772-8E79-3D69D8477DE4
UUID: 6A2925C8-FFA8-455B-A72C-15470412A3BB
4726784 131072 7 Label: "OEM-CONFIG"
Type: C95DC21A-DF0E-4340-8D7B-26CBFA9A03E0
UUID: 7B524685-0362-4B8B-8B2F-A1F58C06ABFF
4857856 4427776 9 Label: "ROOT"
Type: 3884DD41-8582-4404-B9A8-E9B84F2DF50E
UUID: 7704DCC6-6B62-4B14-9BD3-73148C9A0AC4
9285632 990929551 10 Label: "raid.1.1"
Type: 0FC63DAF-8483-4772-8E79-3D69D8477DE4
UUID: EDB7BF9D-16E0-49B1-9277-08600D574A9B
1000215183 32 Sec GPT table
1000215215 1 Sec GPT header
$ cgpt show /dev/nvme1n1
start size part contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
4096 262144 1 Label: "EFI-SYSTEM"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: 36757B18-F990-4FD6-847C-CF8FDC4F97F2
266240 4096 2 Label: "BIOS-BOOT"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: 679BD591-1D9E-4896-881B-FDDEA79EA6C6
270336 2097152 3 Label: "USR-A"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: 7130C94A-213A-4E5A-8E26-6CCE9662F132
2367488 2097152 4 Label: "USR-B"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A57C
4464640 262144 6 Label: "OEM"
Type: 5DFBF5F4-2848-4BAC-AA5E-0D9A20B745A6
UUID: 6A2925C8-FFA8-455B-A72C-15470412A3BB
4726784 131072 7 Label: "OEM-CONFIG"
Type: C95DC21A-DF0E-4340-8D7B-26CBFA9A03E0
UUID: 7B524685-0362-4B8B-8B2F-A1F58C06ABFF
4857856 4427776 9 Label: "ROOT"
Type: 3884DD41-8582-4404-B9A8-E9B84F2DF50E
UUID: 7704DCC6-6B62-4B14-9BD3-73148C9A0AC4
9285632 990929551 10 Label: "raid.1.1"
Type: 0FC63DAF-8483-4772-8E79-3D69D8477DE4
UUID: EDB7BF9D-16E0-49B1-9277-08600D574A9B
1000215183 32 Sec GPT table
1000215215 1 Sec GPT header
Restarting the system, the boot fails because the verity hash mismatch causes read errors.
device-mapper: verity: metadata block XXX is corrupted
Removing the verity
from the Linux kernel command line in GRUB, the mount unit hangs (as expected).
The system should give more useful feedback. Even if it’s just, that CoreOS needs to be installed from scratch, as there is no way around fixing it. (That is what has to be done now, so further debugging or log gathering will prove difficult.)
CL doesn't support having multiple USR-A / USR-B / EFI-SYSTEM / OEM partitions. I suspect that on update the kernel in one EFI-SYSTEM is getting updated and the USR A/B on the other disk is getting updated. The kernel has the verity hash but is checking against the wrong USR A/B. You might be able to salvage it by deleting everything but raid.1.1 on one of the disks, and seeing if the auto-rollback will work. If it doesn't you'll probably need to reinstall. Basically make sure you only have 1 one all the NAMED partitions.
What is 1.1 in “raid.1.1”?
The partitions were copied with dd
, so they matched exactly, and there was no update in between. Does that make sense in any way?
Reinstallation was the only option.
Reinstallation was the only option.
Just to clarify on this (I work with @paulmenzel) a reinstallation only worked after we wiped both disks with wipefs
. We also left out copying the partitions in the end.
We first tried to just reinstall like normal and copied the partitions table and contents with sgdisk
and dd
as follows:
sgdisk /dev/nvme0n1 -R /dev/nvme1n1 && \
sgdisk -G /dev/nvme1n1
dd if=/dev/nvme0n1 of=/dev/nvme1n1 bs=512 count=4427776 # (first 9 partitions)
but whilst the server then booted up, restarting it seemed to trigger the same verity error.