cockpit icon indicating copy to clipboard operation
cockpit copied to clipboard

Fedora Rawhide Storage test failures

Open jelly opened this issue 7 months ago • 4 comments

Explain what happens

TestStorageRaid1.testMetadataAtEnd fails on rawhide:

[    3.630490] EXT4-fs (vda3): mounted filesystem 0fdd1a25-7a62-4739-a0b4-dfbe221b2cd3 r/w with ordered data mode. Quota mode: none.
[   13.720305] loop0: detected capacity change from 0 to 97656
[   14.004566] loop1: detected capacity change from 0 to 97656
[   17.234154] md/raid1:md127: not clean -- starting background reconstruction
[   17.234511] md/raid1:md127: active with 2 out of 2 mirrors
[   17.234792] md127: detected capacity change from 0 to 97536
[   17.235389] md: resync of RAID array md127
[   17.366810] md: md127: resync done.
[   18.152986] md127: detected capacity change from 97536 to 0
[   18.153270] md: md127 stopped.
[   18.168695] GPT:Primary header thinks Alt. header is not at the end of the disk.
[   18.169399] GPT:97535 != 97655
[   18.169684] GPT:Alternate GPT header not at the end of the disk.
[   18.171088] GPT:97535 != 97655
[   18.171418] GPT: Use GNU Parted to correct GPT errors.
[   18.171886]  loop1:
[   18.176426] GPT:Primary header thinks Alt. header is not at the end of the disk.
[   18.176709] GPT:97535 != 97655
[   18.176816] GPT:Alternate GPT header not at the end of the disk.
[   18.179000] GPT:97535 != 97655
[   18.179126] GPT: Use GNU Parted to correct GPT errors.
[   18.179311]  loop0:

This is reproduced when upgrading to 6.15rc3 by sitting in the test:

 # Delete the mdraid device.  Both disks should go back to "Unformatted data"
 testlib.sit()

Then via ssh stopping the mdraid array and calling systemctl daemon-reload which then hangs the whole virtual machine. An different existing ssh session ps hangs, stracing gives:

openat(AT_FDCWD, "/proc/1016/cmdline", O_RDONLY) = 4
read(4, "/sbin/agetty\0-o\0-- \\u\0--noreset\0"..., 131072) = 86
read(4, "", 130986)                     = 0
close(4)                                = 0
openat(AT_FDCWD, "/proc/1016/ctty", O_RDONLY) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/dev/ttyS64",

jelly avatar Apr 24 '25 14:04 jelly

Cannot reproduce it with:

sudo dnf update --exclude=kernel*

So this is a kernel regression

jelly avatar Apr 24 '25 14:04 jelly

Reproducer:

truncate --size=50MB /var/tmp/member1; losetup -P --show --find /var/tmp/member1
truncate --size=50MB /var/tmp/member2; losetup -P --show --find /var/tmp/member2
mdadm --create SOMERAID --run --level=1 --metadata=1.0 --raid-devices=2 /dev/loop0 /dev/loop1
mdadm --stop /dev/md/SOMERAID
systemctl daemon-reload

jelly avatar Apr 25 '25 08:04 jelly

Reported here https://bugzilla.redhat.com/show_bug.cgi?id=2362273

So for bisect purposes last good commit is fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2

git log v6.15-rc3...fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2

jelly avatar Apr 25 '25 09:04 jelly

Should be fixed in https://lkml.org/lkml/2025/4/23/188 or rc4

jelly avatar Apr 28 '25 08:04 jelly

Seems fixed now.

martinpitt avatar Jul 14 '25 15:07 martinpitt