btrfs-progs
btrfs-progs copied to clipboard
Unable to set seeding status on host-managed smr drive
Dear all,
(linux 6.5.0, btrfs-progs v6.3.3) I was trying to set seeding status on a host-managed smr drive (HGST HSH721414ALE6M4), which was encrypted by dm-crypt.
root@nas:~# btrfstune -S 1 -f /dev/mapper/diskp-old
Error reading 39131861975040, -1
Error reading 39131861975040, -1
ERROR: cannot read chunk root
ERROR: open ctree failed
But btrfs check didn't report any error.
lsblk shows:
root@nas:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
......
sde 8:64 0 12.7T 0 disk
└─diskp-old 253:0 0 12.7T 0 crypt
......
Looks like something went wrong when reading the drive. So I strace btrfstune to see what was happening.
root@nas:~# strace btrfstune -S 1 -f /dev/mapper/diskp-old
execve("/usr/local/bin/btrfstune", ["btrfstune", "-S", "1", "-f", "/dev/mapper/diskp-old"], 0x7ffe76be2e10 /* 19 vars */) = 0
......
openat(AT_FDCWD, "/sys/block/dm-0/queue/zoned", O_RDONLY) = 4
read(4, "host-managed\n", 32) = 13
close(4) = 0
openat(AT_FDCWD, "/dev/mapper/diskp-old", O_RDWR|O_DIRECT) = 4
fadvise64(4, 0, 0, POSIX_FADV_DONTNEED) = 0
newfstatat(3, "", {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfd, 0), ...}, AT_EMPTY_PATH) = 0
ioctl(3, BLKGETZONESZ, [524288]) = 0
ioctl(3, BLKREPORTZONE, 0x1bcf720) = 0
newfstatat(3, "", {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfd, 0), ...}, AT_EMPTY_PATH) = 0
ioctl(3, BLKSSZGET, [512]) = 0
pread64(3, "\266\314\1\376z\25\257\255\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
pread64(4, 0x1bd1ad0, 16384, 13955032121344) = -1 EINVAL (Invalid argument)
write(2, "Error reading 39131861975040, -1"..., 33Error reading 39131861975040, -1
) = 33
pread64(4, 0x1bd1ad0, 16384, 13955300556800) = -1 EINVAL (Invalid argument)
write(2, "Error reading 39131861975040, -1"..., 33Error reading 39131861975040, -1
) = 33
write(2, "ERROR: ", 7ERROR: ) = 7
write(2, "cannot read chunk root", 22cannot read chunk root) = 22
write(2, "\n", 1
) = 1
......
My drive was opened with direct IO and the two pread failed with EINVAL. It is possible that the address or offset isn't aligned properly. In this example, the buffer 0x1bd1ad0 passed to pread may not aligned properly because 0x1bd1ad0 % 4096 = 2768, 0x1bd1ad0 % 512 = 208.
I wrote a simple program to pread the size and offset indicated above, with aligned (aligned_alloc to 16384) and unaligned (malloc) buffers. The aligned buffer can be pread successfully but the unaligned buffer fails with EINVAL.
I believe this issue only relates to host-managed smr drives because I tested btrfstune -S 1 on normal drives and it works properly. I also checked the code in tune/main.c and disk-io.c but still not sure how this happened and how to fix.
Any idea on this issue?
I think seeding+zoned hasn't been tested as a use case, it might work if the super block update is done in the zoned friendly way.
The commands from btrfstune do some changes directly to the superblock which might not be using the log-style write and violate the sequential writing constraint. We need to do full coverage of all zoned + btrfstune features too.