bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

Storing qemu images on bcachefs breaks xfs in VM

Open jpf91 opened this issue 1 year ago • 15 comments

Hi there,

thanks for fixing #717 so quickly. I only recently upgraded my kernel and I have not seen the issue anymore :+1:

Now for this bug report, even with latest mainline kernel I can still reproduce my issue with storing VM images on bcachefs. This is probably a fringe usecase, but in the end I guess bcachefs should support this to be a full featured FS.

Summary: The Proxmox Hypervisor currently has no native driver for bcachefs, but it'd still be nice to use the normal QEMU file storage on a bcachefs filesystem. So I tried to set this up in Proxmox and install a Centos VM onto bcachefs storage, but the installation fails. I then tried to get a slightly simpler reproduction case.

VM Host

OS: Proxmox VE 8.2.7 / Debian 12 Bookworm Kernel: Ubuntu Mainline PPA 6.12.1 (6.12.1-061201-generic) Bcachefs Tools: v1.13.0 tag build from source bcachefs show-super:

Device:                                     HGST HDN728080AL
External UUID:                             cca5bc65-fe77-409d-a9fa-465a6e7f4eae
Internal UUID:                             ca668445-d05c-47f8-8b05-92c30245a167
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     NAS_DATA
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  1.13: inode_has_child_snapshots
Oldest version on disk:                    1.4: member_seq
Created:                                   Fri Jul  5 14:09:12 2024
Sequence number:                           128
Time of last write:                        Sat Nov 30 20:34:55 2024
Superblock size:                           7.45 KiB/1.00 MiB
Clean:                                     0
Devices:                                   5
Sections:                                  members_v1,crypt,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       2
  data_replicas:                           2
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             zstd
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          ssd
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 736):
Device:                                    0
  Label:                                   hdd1 (1)
  UUID:                                    141032c8-2583-4306-b4c1-412696d46be5
  Size:                                    7.28 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 30523541
  Last mount:                              Sat Nov 30 20:27:20 2024
  Last superblock write:                   128
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user,cached
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    1
  Label:                                   hdd2 (2)
  UUID:                                    d038124b-d4a5-4deb-bdd1-eb423c9189c8
  Size:                                    7.33 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 30758228
  Last mount:                              Sat Nov 30 20:27:20 2024
  Last superblock write:                   128
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user,cached
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    2
  Label:                                   hdd3 (3)
  UUID:                                    09811319-852f-4ac1-a1a9-8aef619df346
  Size:                                    7.28 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 30523541
  Last mount:                              Sat Nov 30 20:27:20 2024
  Last superblock write:                   128
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user,cached
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    3
  Label:                                   ssd1 (5)
  UUID:                                    074844ac-70c4-4cd7-a302-fa1946985849
  Size:                                    631 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 2582576
  Last mount:                              Sat Nov 30 20:27:20 2024
  Last superblock write:                   128
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        32.0 MiB
  Btree allocated bitmap:                  0000000000000000000000001111111111111111111111111111111111111111
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1
Device:                                    4
  Label:                                   ssd2 (6)
  UUID:                                    4dd47f69-b955-4de5-b9b9-2a6dc60ca16c
  Size:                                    165 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 674860
  Last mount:                              Sat Nov 30 20:27:20 2024
  Last superblock write:                   128
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        8.00 MiB
  Btree allocated bitmap:                  0000000000000000000000111111111111111111111111111111111111111111
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1

errors (size 40):
fs_usage_cached_wrong                       1               Mon Oct  7 16:09:57 2024
fs_usage_replicas_wrong                     2               Mon Oct  7 16:09:57 2024

The VM was created in Proxmox using default settings for storage and image. This runs qemu kvm like this:

/usr/bin/kvm -id 103 -name test,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/103.pid -daemonize -smbios type=1,uuid=073b59e0-198d-4896-afae-9e1982164f4a -smp 4,sockets=1,cores=4,maxcpus=4 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/103.vnc,password=on -cpu qemu64,+aes,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3 -m 2048 -object iothread,id=iothread-virtioscsi0 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=3160b218-4ba2-42e6-bfb7-5ef0e4df3131 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:907ae15e667 -drive file=/var/lib/pve/local-btrfs/template/iso/Fedora-Workstation-Live-x86_64-41-1.4.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/mnt/data/services/pve//images/103/vm-103-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=101 -netdev type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:74:0B:4F,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102 -machine type=pc+pve0

VM

OS: Fedora 41 workstation Live CD Kernel: 6.11.4-301.fc41.x86_64

  1. Format /dev/sda using fdisk and create one partition. This works fine.
  2. Run mkfs.xfs. This fails:
root@localhost-live:~# mkfs.xfs /dev/sda1
meta-data=/dev/sda1              isize=512    agcount=4, agsize=2097024 blks
       =                       sectsz=512   attr=2, projid32bit=1
       =                       crc=1        finobt=1, sparse=1, rmapbt=1
       =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=8388096, imaxpct=25
       =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
       =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on xfs_sb bno 0x0/0x1, err=121
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on (unknown) bno 0x1fff838/0x2, err=121
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on xfs_sb bno 0x0/0x1, err=121
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on xfs_agf bno 0x1/0x1, err=121
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on xfs_agfl bno 0x3/0x1, err=121
mkfs.xfs: pwrite failed: Remote I/O error
libxfs_bwrite: write failed on xfs_agi bno 0x2/0x1, err=121
mkfs.xfs: writing AG headers failed, err=121

After this, the following errors can be found in the VM dmesg:

[  740.241536] sd 2:0:0:0: [sda] tag#212 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.241540] sd 2:0:0:0: [sda] tag#212 Sense Key : Illegal Request [current] 
[  740.241542] sd 2:0:0:0: [sda] tag#212 Add. Sense: Invalid field in cdb
[  740.241544] sd 2:0:0:0: [sda] tag#212 CDB: Write(10) 2a 00 00 00 08 00 00 00 01 00
[  740.241545] critical target error, dev sda, sector 2048 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
[  740.242534] sd 2:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.242538] sd 2:0:0:0: [sda] tag#62 Sense Key : Illegal Request [current] 
[  740.242540] sd 2:0:0:0: [sda] tag#62 Add. Sense: Invalid field in cdb
[  740.242542] sd 2:0:0:0: [sda] tag#62 CDB: Write(10) 2a 00 02 00 00 38 00 00 02 00
[  740.242543] critical target error, dev sda, sector 33554488 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
[  740.242740] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.242742] sd 2:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
[  740.242752] sd 2:0:0:0: [sda] tag#0 Add. Sense: Invalid field in cdb
[  740.242754] sd 2:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 00 00 08 00 00 00 01 00
[  740.242755] critical target error, dev sda, sector 2048 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
[  740.244685] sd 2:0:0:0: [sda] tag#214 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.244687] sd 2:0:0:0: [sda] tag#214 Sense Key : Illegal Request [current] 
[  740.244689] sd 2:0:0:0: [sda] tag#214 Add. Sense: Invalid field in cdb
[  740.244690] sd 2:0:0:0: [sda] tag#214 CDB: Write(10) 2a 00 00 00 08 01 00 00 01 00
[  740.244691] critical target error, dev sda, sector 2049 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
[  740.244842] sd 2:0:0:0: [sda] tag#215 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.244843] sd 2:0:0:0: [sda] tag#215 Sense Key : Illegal Request [current] 
[  740.244844] sd 2:0:0:0: [sda] tag#215 Add. Sense: Invalid field in cdb
[  740.244845] sd 2:0:0:0: [sda] tag#215 CDB: Write(10) 2a 00 00 00 08 03 00 00 01 00
[  740.244846] critical target error, dev sda, sector 2051 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
[  740.244980] sd 2:0:0:0: [sda] tag#216 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  740.244981] sd 2:0:0:0: [sda] tag#216 Sense Key : Illegal Request [current] 
[  740.244983] sd 2:0:0:0: [sda] tag#216 Add. Sense: Invalid field in cdb
[  740.244984] sd 2:0:0:0: [sda] tag#216 CDB: Write(10) 2a 00 00 00 08 02 00 00 01 00
[  740.244984] critical target error, dev sda, sector 2050 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2

There are no messages in the VM Host dmesg.

Interestingly, other filesystems seem to work better. Manually creating a ext4 fs, mounting and creating / deleting files worked. So I tried to do a default fedora installation, which uses only btrfs and ext4. This works as well. It installs just fine and the installed OS boots. I did not do any further testing though.

So this seems to be somewhat xfs specific. If there's any additional info that could help to debug this further, please let me know.

jpf91 avatar Nov 30 '24 22:11 jpf91

I experience this kind of trouble when using qemu with qcow2 with many different filesystems. When using converted raw images all problems go away. But then of course I loose all qcow2 goodies, the worst is backup options online.

I've tried using qcaw2 images for VMs with noqcow option set in bcachefs on filesystem and file level but that was making whole filesystem crash (well bugs there are) so things got even worse with that.

I'm searching for an option here, will try if more sofisticated images with raw backing file perhaps will solve the issue for now.

elmystico avatar Jan 21 '25 20:01 elmystico

anyone know if this bug is still happening?

koverstreet avatar Aug 03 '25 17:08 koverstreet

Hey Kent, thanks for looking into this. I can still reproduce the issue with kernel 6.14.0-2-pve. Error messages, dmesg etc. are all exactly the same. If it helps, I can update to 6.16 from ubuntu mainline ppa and test it next weekend.

jpf91 avatar Aug 04 '25 19:08 jpf91

Please do let me know the 6.16 results; if it still happens there I'll see what I can see

koverstreet avatar Aug 04 '25 20:08 koverstreet

Still happens on 6.16.0-061600-generic Ubuntu PPA kernel.

So far, I only tested with the Proxmox GUI. When trying to produce a simple qemu reproduction case, I first couldn't reproduce the bug. Some more debugging then showed this only happens with the cache=none option. Here's a 'simple' test case:

# In a folder on a bcachefs mount
# Use SystemRescue, as we can boot everything using the serial console there
wget https://fastly-cdn.system-rescue.org/releases/12.01/systemrescue-12.01-amd64.iso -O boot.iso
qemu-img create -f qcow2 disk.qcow2 1G

qemu-system-x86_64 \
  -enable-kvm \
  -m 2048 \
  -drive file=disk.qcow2,format=qcow2,cache=none \
  -cdrom boot.iso \
  -boot d \
  -serial mon:stdio \
  -display none

# Select "Boot SystemRescue with serial console"
mkfs.xfs /dev/sda

This produces the following output:

meta-data=/dev/sda               isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=0  
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x1fff00/0x100, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x0/0x100, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on xfs_sb bno 0x0/0x1, err=5
mkfs.xfs: Releasing dirty buffer to free list!
mkfs.xfs: libxfs_device_zero write failed: Input/output error

The QEMU documentation does not say much about cache modes (search for cache=cache). The Proxmox Wiki says cache=none makes qemu use O_DIRECT semantics. So maybe it's an issue with direct IO?

jpf91 avatar Aug 09 '25 12:08 jpf91

I'd also like to add that this problem does not seem to be XFS-specific. mkfs.ext4 hangs on Writing superblocks and filesystem accounting information: and mkfs.fat exits with unable to synchronize /dev/sda:Input/output error.

Side note: replacing mkfs calls with

echo 1 > /dev/sda
cat /dev/sda

also seems to highlight the problem. When QCOW image is stored on a different filesystem, cat prints this 1 that was written to a fresh image file. When stored on bcachefs, nothing is printed.

EDIT: The mkfs.xfs command also fails when using a raw image instead of a QCOW one in the VM. Additionally, the whole problem does not occur when bcachefs is used in a single-device setup. I didn't use nocow when testing.

dpieczynski avatar Aug 12 '25 21:08 dpieczynski

bcachefs commit: 02f551db4b1d5a845382bb5d9b3ca29344fd7fa3 bcachefs-tools: 1.25.3+1c551b0

I can reproduce this but it only happens with 4k block sizes, not 512, and only with the cache=none flag

dd if=/dev/zero of=b0.img bs=1M count=4096
losetup --sector-size 4096 /dev/loop0 b0.img
bcachefs format /dev/loop0
mkdir /mnt/bcachefs_test
mount /dev/loop0 /mnt/bcachefs_test
pushd /mnt/bcachefs_test
wget https://fastly-cdn.system-rescue.org/releases/12.01/systemrescue-12.01-amd64.iso -O boot.iso

qemu-img create -f qcow2 disk.qcow2 1G

qemu-system-x86_64 \
  -enable-kvm \
  -m 2048 \
  -drive file=disk.qcow2,format=qcow2,cache=none \
  -cdrom boot.iso \
  -boot d \
  -serial mon:stdio \
  -display none

Then mkfs.xfs /dev/sda in the vm

qubitnano avatar Sep 08 '25 00:09 qubitnano

VM disks with NTFS guest file system is also affected.

I try to run winapps (which is a wrapper of dockur/windows) with an NTFS qcow2 disk in a bcachefs filesystem. the Windows installer cannot create the partition table nor the NTFS partition during installation process.

move the qcow2 disk image out to a xfs partition immediately solved the problem.

My kernel version is 6.16.4 from NixOS package.

ttimasdf avatar Sep 10 '25 02:09 ttimasdf

For what it's worth I can report I've been running VMs without qcow2, just raw .img backed by bcachefs for a fairly long time and not seen any corruption. 4k block sizes too. Perhaps qcow2 is the defining factor here.

RX14 avatar Sep 23 '25 19:09 RX14

lots of things competing for top of the todo list, but this is getting up there

koverstreet avatar Sep 24 '25 00:09 koverstreet

lots of things competing for top of the todo list, but this is getting up there

This sounds quite curious. This issue is data corruption, which I would have thought is as serious as it gets for a filesystem. But you're saying there are lots of potentially more serious things. Maybe this is just unfortunate wording

khumarahn avatar Sep 24 '25 02:09 khumarahn

That doesn't look like a corruption to me, I see IO errors - corruptions are silent.

The other thing that makes it less concerning is that it only occurs under very specific circumstances, meaning it's unlikely users will be taken by surprise while in the middle of something. It's also not a regression, those do get jumped on right away.

yes, everyone wants their bug addressed right away, but if every single known bug was already fixed then it wouldn't be marked experimental anymore :)

koverstreet avatar Sep 24 '25 03:09 koverstreet

@jpf91

The Proxmox Wiki says cache=none makes qemu use O_DIRECT semantics. So maybe it's an issue with direct IO?

yes, i think so.

cache=none, i.e. O_DIRECT is also known to be problematic with btrfs and datacow enabled

https://bugzilla.redhat.com/show_bug.cgi?id=1914433 https://bugzilla.kernel.org/show_bug.cgi?id=99171#c16

devZer0 avatar Oct 09 '25 23:10 devZer0

@koverstreet , i would like to confirm this error on latest proxmox with pve kernel 6.17 and latest bcachefs dkms package + tools

i reformatted my bcachefs mirror on two ordinary hdd (512byte sectors) with bcachefs 4k blocksize (like told at https://github.com/koverstreet/bcachefs/issues/791#issuecomment-3264226556 that it is 4k bs specific) and migrated back my virtual machine with a 5gb xfs formatted virtual disk (qcow2 with cache=none io_uring and iothread=1)

deleting a file from there immediately resulted in this io error in the vm kernel:

[ 2122.583393] sd 1:0:0:1: [sdb] tag#74 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [ 2122.583403] sd 1:0:0:1: [sdb] tag#74 Sense Key : Illegal Request [current] [ 2122.583408] sd 1:0:0:1: [sdb] tag#74 Add. Sense: Invalid field in cdb [ 2122.583411] sd 1:0:0:1: [sdb] tag#74 CDB: Write(10) 2a 08 00 50 04 fb 00 00 0a 00 [ 2122.583413] critical target error, dev sdb, sector 5244155 op 0x1:(WRITE) flags 0x29800 phys_seg 1 prio class 2 [ 2122.583482] critical target error, dev sdb, sector 5244155 op 0x1:(WRITE) flags 0x29800 phys_seg 1 prio class 2 [ 2122.583550] XFS (sdb1): log I/O error -121 [ 2122.583601] XFS (sdb1): Filesystem has been shut down due to log error (0x2). [ 2122.583652] XFS (sdb1): Please unmount the filesystem and rectify the problem(s).

this indeed only seems to affect xfs. i also cannot reformat the virtual disk, getting "Illegal Request" immediately

it indeed seems xfs specific, i have a second identical virtual 5gb disk mounted with ext4 , which does NOT show this behaviour

no errors on host level in dmesg

bcachefs checked with

bcachefs data scrub /bcachefs bcachefs fsck /dev/sda2 bcachefs fsck /dev/sdb2

i had no problems with 512byte bcachefs blocksize before, and i tried hard to break it without success - for example via o_direct testing tools ( see https://lore.kernel.org/linux-bcachefs/[email protected]/T/#u ). btw, any hint how we can check mirror consistency ? does scrub show when there is different data on disk1+disk2 ?

devZer0 avatar Oct 15 '25 20:10 devZer0

guess this has to do with direct_io and sector/block size alignment

chatgpt brought me to this one:

https://bugs.launchpad.net/fuel/+bug/1316266

and

# mkfs.xfs -b size=4k -s size=4k -f /dev/sdb1

works

whereas

# mkfs.xfs -b size=4k -s size=512 -f /dev/sdb1

fails with the errors reported

guess xfs does 512b sector size in vm by default, because qemu emulates virtual disk with 512b size by default

root@debian13-1:/# cat /sys/block/sdb/queue/logical_block_size 512

root@debian13-1:/# cat /sys/block/sdb/queue/physical_block_size 512

devZer0 avatar Oct 15 '25 21:10 devZer0