Getting stuck trying to create a new 1 GiB file on an FS that has 41 GiBs of free space (needs: allocator to check for ENOSPC aside from disk reservations)
I have a bcachefs filesystem that’s at 90% capacity. According to bcachefs fs usage, I have 41 GiBs of free space left. When I try to write data to that file system, it will work, but only if the data is really small (i.e., if I create a new plain text file that contains a single sentence). If I try to create a new file that contains 1 GiB of data, then the program creating the file will get stuck seemingly forever. At this point, if I try to shut down the system, then it will take a while because systemd will get stuck waiting for sd-sync to finish. Eventually, systemd will give up waiting for sd-sync and forcefully shut down the system.
Version information
- Linux version: 6.16.9
- bcachefs-tools version: 1.31.3
- I’m using the DKMS module that comes with bcachefs-tools.
Steps to reproduce
-
Make sure that you have a problematic bcachefs filesystem. I don’t know how to create a problematic bcachefs filesystem from scratch, but I do have a backup of a problematic bcachefs filesystem that I’ve been using for testing.
-
Mount the problematic filesystem by running this command:
run0 mount UUID=<UUID of problematic filesystem> <mountpoint> -
Wait for that command to finish.
-
Change directory into the newly mounted filesystem by running this command:
cd <path to mountpoint> -
Try to create a new 1 GiB file in the newly mounted filesystem by running this command:
run0 dd if=/dev/zero of='Test file' bs=1048576 count=1024 status=progress
Results
The dd command seemingly never finishes. After a little bit, it gets stuck showing something like this:
570425344 bytes (570 MB, 544 MiB) copied, 23 s, 24.7 MB/s
Here’s a log of kernel messages that were produced after dd got stuck.
Your filesystem somehow got itself very low on actual non-reserved space, and copygc does not seem to make progress (?). Please post bcachefs show-super output, bcachefs fs usage -ha before writing a large file, bcachefs fs usage -ha after the "Allocator stuck?" messages show up, and /sys/fs/bcachefs/*/internal/moving_ctxts contents at the same time.
OK. I reproduced the bug again. This time I used Linux version 6.17.1, bcachefs-tools version 1.31.7 and the DKMS module that comes with that version of bcachefs-tools. Here’s the output from bcachefs show-super <device>:
External UUID: ccd95d13-0ffb-4123-9f77-59bc18232b38 Internal UUID: 5d101165-1b29-4949-9fed-45d8174314ab Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef Device index: 0 Label: (none) Version: 1.28: inode_has_case_insensitive Incompatible features allowed: 1.20: directory_size Incompatible features in use: 0.0: (unknown version) Version upgrade complete: 1.28: inode_has_case_insensitive Oldest version on disk: 1.20: directory_size Created: Fri Jul 11 12:35:30 2025 Sequence number: 517 Time of last write: Fri Sep 19 07:06:52 2025 Superblock size: 5.20 KiB/1.00 MiB Clean: 0 Devices: 1 Sections: members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade,recovery_passes Features: journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done Options: block_size: 512 B btree_node_size: 256 KiB errors: continue [fix_safe] panic ro write_error_timeout: 30 metadata_replicas: 1 data_replicas: 1 metadata_replicas_required: 1 data_replicas_required: 1 encoded_extent_max: 64.0 KiB metadata_checksum: none [crc32c] crc64 xxhash data_checksum: none [crc32c] crc64 xxhash checksum_err_retry_nr: 3 compression: none background_compression: none str_hash: crc32c crc64 [siphash] metadata_target: none foreground_target: none background_target: none promote_target: none erasure_code: 0 casefold: 0 inodes_32bit: 1 shard_inode_numbers_bits: 3 inodes_use_key_cache: 1 gc_reserve_percent: 8 gc_reserve_bytes: 0 B root_reserve_percent: 0 wide_macs: 0 promote_whole_extents: 1 acl: 1 usrquota: 0 grpquota: 0 prjquota: 0 degraded: [ask] yes very no journal_flush_delay: 1000 journal_flush_disabled: 0 journal_reclaim_delay: 100 journal_transaction_names: 1 allocator_stuck_timeout: 30 version_upgrade: [compatible] incompatible none nocow: 0 rebalance_on_ac_only: 0 errors (size 8): Device 0: /dev/vdb2 (unknown model) Label: (none) UUID: 2b36a905-92ec-4007-a006-b64096633531 Size: 441 GiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 441 KiB First bucket: 0 Buckets: 1048576 Last mount: Fri Sep 19 07:06:52 2025 Last superblock write: 517 State: rw Data allowed: journal,btree,user Has data: journal,btree,user Btree allocated bitmap blocksize: 16.0 MiB Btree allocated bitmap: 0000000011111111111111111111100111111001111111101100000000011011 Durability: 1 Discard: 1 Freespace initialized: 1 Resize on mount: 0
Here’s the output of bcachefs fs usage -ha before I ran the dd command:
Filesystem: ccd95d13-0ffb-4123-9f77-59bc18232b38
Size: 405 GiB
Used: 397 GiB
Online reserved: 512 KiB
Data by durability desired and amount degraded:
undegraded
1x: 397 GiB
reserved: 253 MiB
Data type Required/total Durability Devices
reserved: 1/1 [] 253 MiB
btree: 1/1 1 [vdb2] 9.70 GiB
user: 1/1 1 [vdb2] 387 GiB
Btree usage:
extents: 1.59 GiB
inodes: 3.08 GiB
dirents: 1.13 GiB
xattrs: 256 KiB
alloc: 153 MiB
reflink: 201 MiB
subvolumes: 256 KiB
snapshots: 256 KiB
lru: 2.25 MiB
freespace: 512 KiB
need_discard: 2.00 MiB
backpointers: 1.10 GiB
bucket_gens: 2.25 MiB
snapshot_trees: 256 KiB
deleted_inodes: 256 KiB
logged_ops: 256 KiB
accounting: 2.44 GiB
(no label) (device 0): vdb2 rw 90%
data buckets fragmented
free: 13.3 MiB 31
sb: 2.00 MiB 5 151 KiB
journal: 3.44 GiB 8192
btree: 9.70 GiB 39736 6.99 GiB
user: 387 GiB 926403 2.38 GiB
cached: 0 B 0
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 31.2 GiB 74209
unstriped: 0 B 0
capacity: 441 GiB 1048576
bucket size: 441 KiB
Here’s the output of bcachefs fs usage -ha after one of the “Allocator stuck?” messages appeared:
Filesystem: ccd95d13-0ffb-4123-9f77-59bc18232b38
Size: 405 GiB
Used: 397 GiB
Online reserved: 644 MiB
Data by durability desired and amount degraded:
undegraded
1x: 397 GiB
reserved: 253 MiB
Data type Required/total Durability Devices
reserved: 1/1 [] 253 MiB
btree: 1/1 1 [vdb2] 9.70 GiB
user: 1/1 1 [vdb2] 387 GiB
Btree usage:
extents: 1.59 GiB
inodes: 3.08 GiB
dirents: 1.13 GiB
xattrs: 256 KiB
alloc: 153 MiB
reflink: 201 MiB
subvolumes: 256 KiB
snapshots: 256 KiB
lru: 2.25 MiB
freespace: 512 KiB
need_discard: 2.00 MiB
backpointers: 1.10 GiB
bucket_gens: 2.25 MiB
snapshot_trees: 256 KiB
deleted_inodes: 256 KiB
logged_ops: 256 KiB
accounting: 2.44 GiB
(no label) (device 0): vdb2 rw 90%
data buckets fragmented
free: 12.9 MiB 30
sb: 2.00 MiB 5 151 KiB
journal: 3.44 GiB 8192
btree: 9.70 GiB 39733 6.99 GiB
user: 387 GiB 926325 2.34 GiB
cached: 0 B 0
parity: 0 B 0
stripe: 0 B 0
need_gc_gens: 0 B 0
need_discard: 31.2 GiB 74291
unstriped: 0 B 0
capacity: 441 GiB 1048576
bucket size: 441 KiB
Here’s the output of cat /sys/fs/bcachefs/*/internal/moving_ctxts after one of the “Allocator stuck?” messages appeared:
rebalance_work: data type==user pos=extents:POS_MIN keys moved: 0 keys raced: 0 bytes seen: 0 B bytes moved: 0 B bytes raced: 0 B reads: ios 0/32 sectors 0/2048 writes: ios 0/32 sectors 0/2048 copygc: data type==user pos=extents:74810271:0:626031552 keys moved: 97113 keys raced: 253 bytes seen: 2.36 TiB bytes moved: 1.35 GiB bytes raced: 3.31 MiB reads: ios 0/32 sectors 0/2048 writes: ios 0/32 sectors 0/2048
Two things of note: first, you are not actually using the DKMS module (1.31), your FS has version 1.28 which matches the in-tree 6.16 kernel version.
Second, your FS has non-power-of-2 bucket size, which is suboptimal. It would be great if you could provide information on the way it was initially formatted, most importantly the bcachefs-tools version used for formatting. I believe that all issues leading to non-round bucket sizes being chosen on format were long fixed, but maybe you've found another case.
So, what's going on here is that you have a large amount of space (~7 GB) in fragmented btree usage. Usually copygc would be able to better pack metadata (btree) and free up this space, so we do not account it as "used". But due to bad bucket size the "bucket tails" are actually unusable, and copygc cannot do anything about them.
So the actual bug here is that the filesystem fails to return ENOSPC due to misaccounting of free space with unaligned bucket sizes. It is still an issue that should be fixed, but maybe not a high-priority one.
On the other hand, if you know how to reproduce bcachefs format choosing such bucket size, that would be a very high-priority issue.
Two things of note: first, you are not actually using the DKMS module (1.31), your FS has version 1.28 which matches the in-tree 6.16 kernel version.
That’s surprising to hear. I thought that there were some situations where the FS version would not match the latest FS version supported by the bcachefs kernel module you were using. I guess that there aren’t any situations where that can happen which is surprising to me. Is there anything that I can do in order to force it to use the DKMS bcachefs module instead of the in-tree bcachefs module?
Second, your FS has non-power-of-2 bucket size, which is suboptimal. It would be great if you could provide information on the way it was initially formatted, most importantly the bcachefs-tools version used for formatting. I believe that all issues leading to non-round bucket sizes being chosen on format were long fixed, but maybe you've found another case.
The filesystem was created when I used this Nix flake to do an unattended installation of NixOS on my laptop. I don’t know for sure which revision of that flake I used, but I’m guessing that I used e034966a907a9f97076a36520acf39d2c42980d9. That commit was made at “Fri Jul 11 11:42:54 2025 -0400” which is right before the time that the filesystem was created (“Fri Jul 11 12:35:30 2025”). I don’t have any logs from back when I did that unattended installation, but I was able to do a new unattended installation using revision e034966a907a9f97076a36520acf39d2c42980d9 of that flake. The new unattended installation used Linux version 6.14.11 and bcachefs-tools version 1.25.1.
Here’s a log of what the unattended installer did for disk partitioning and filesystem creation:
umount: /mnt/disko-install-root: not mounted
++ realpath /dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ disk=/dev/nvme0n1
+ lsblk -a -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
loop0
loop1
loop2
loop3
loop4
loop5
loop6
loop7
sda
├─sda1 vfat FAT32 AD49-EAA1 995.8M 3% /boot
└─sda2 bcachefs 1.20 d16fab8a-0000-41ce-bd92-73a7ce153580 16.2G 35% /nix/store
/
nvme0n1
├─nvme0n1p1 vfat FAT32 2901-C94E
├─nvme0n1p2 bcachefs 1.28 07ba0b33-eb07-422c-ae09-5ec16bbd938c
└─nvme0n1p3 swap 1 51423e66-8a0f-4ab8-b84b-84394e13010b
+ lsblk --output-all --json
+ bash -x
++ dirname /nix/store/fpwn44vygjj6bfn8s1jj9p8yh6jhfxni-disk-deactivate/disk-deactivate
+ jq -r -f /nix/store/fpwn44vygjj6bfn8s1jj9p8yh6jhfxni-disk-deactivate/zfs-swap-deactivate.jq
+ lsblk --output-all --json
+ bash -x
++ dirname /nix/store/fpwn44vygjj6bfn8s1jj9p8yh6jhfxni-disk-deactivate/disk-deactivate
+ jq -r --arg disk_to_clear /dev/nvme0n1 -f /nix/store/fpwn44vygjj6bfn8s1jj9p8yh6jhfxni-disk-deactivate/disk-deactivate.jq
+ set -fu
+ wipefs --all -f /dev/nvme0n1p1
/dev/nvme0n1p1: 8 bytes were erased at offset 0x00000052 (vfat): 46 41 54 33 32 20 20 20
/dev/nvme0n1p1: 1 byte was erased at offset 0x00000000 (vfat): eb
/dev/nvme0n1p1: 2 bytes were erased at offset 0x000001fe (vfat): 55 aa
+ wipefs --all -f /dev/nvme0n1p2
/dev/nvme0n1p2: 16 bytes were erased at offset 0x00001018 (bcachefs): c6 85 73 f6 66 ce 90 a9 d9 6a 60 cf 80 3d f7 ef
/dev/nvme0n1p2: 16 bytes were erased at offset 0x6e30a00018 (bcachefs): c6 85 73 f6 66 ce 90 a9 d9 6a 60 cf 80 3d f7 ef
+ swapoff /dev/nvme0n1p3
swapoff: /dev/nvme0n1p3: swapoff failed: Invalid argument
+ wipefs --all -f /dev/nvme0n1p3
/dev/nvme0n1p3: 10 bytes were erased at offset 0x00000ff6 (swap): 53 57 41 50 53 50 41 43 45 32
++ type zdb
++ zdb -l /dev/nvme0n1
++ sed -nr 's/ +name: '\''(.*)'\''/\1/p'
+ zpool=
+ [[ -n '' ]]
+ unset zpool
++ lsblk /dev/nvme0n1 -l -p -o type,name
++ awk 'match($1,"raid.*") {print $2}'
+ md_dev=
+ [[ -n '' ]]
+ wipefs --all -f /dev/nvme0n1
/dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme0n1: 8 bytes were erased at offset 0x7470c05e00 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
+ dd if=/dev/zero of=/dev/nvme0n1 bs=440 count=1
1+0 records in
1+0 records out
440 bytes copied, 0.000212851 s, 2.1 MB/s
+ lsblk -a -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
loop0
loop1
loop2
loop3
loop4
loop5
loop6
loop7
sda
├─sda1 vfat FAT32 AD49-EAA1 995.8M 3% /boot
└─sda2 bcachefs 1.20 d16fab8a-0000-41ce-bd92-73a7ce153580 16.2G 35% /nix/store
/
nvme0n1
++ mktemp -d
+ disko_devices_dir=/tmp/tmp.JcpTlbs8vt
+ trap 'rm -rf "$disko_devices_dir"' EXIT
+ mkdir -p /tmp/tmp.JcpTlbs8vt
+ destroy=1
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ imageName=main
+ imageSize=2G
+ name=main
+ type=disk
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ efiGptPartitionFirst=1
+ type=gpt
+ blkid /dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ sgdisk --clear /dev/disk/by-path/pci-0000:02:00.0-nvme-1
nvme0n1:
Creating new GPT entries in memory.
The operation has completed successfully.
nvme0n1:
+ sgdisk --align-end --new=1:0:+1G --partition-guid=1:R --change-name=1:disk-main-efiSystemPartiton --typecode=1:C12A7328-F81F-11D2-BA4B-00A0C93EC93B /dev/disk/by-path/pci-0000:02:00.0-nvme-1
The operation has completed successfully.
nvme0n1: p1
+ partprobe /dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ udevadm trigger --subsystem-match=block
+ udevadm settle --timeout 120
+ sgdisk --align-end --new=2:0:-24G --partition-guid=2:R --change-name=2:disk-main-nixosRoot --typecode=2:4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709 /dev/disk/by-path/pci-0000:02:00.0-nvme-1
nvme0n1: p1 p2
The operation has completed successfully.
nvme0n1: p1 p2
+ partprobe /dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ udevadm trigger --subsystem-match=block
+ udevadm settle --timeout 120
+ sgdisk --align-end --new=3:0:-0 --partition-guid=3:R --change-name=3:disk-main-nixosSwap --typecode=3:0657fd6d-a4ab-43c4-84e5-0933c84b4f4f /dev/disk/by-path/pci-0000:02:00.0-nvme-1
nvme0n1: p1 p2 p3
The operation has completed successfully.
nvme0n1: p1 p2 p3
+ partprobe /dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ udevadm trigger --subsystem-match=block
+ udevadm settle --timeout 120
+ device=/dev/disk/by-partlabel/disk-main-efiSystemPartiton
+ extraArgs=()
+ declare -a extraArgs
+ format=vfat
+ mountOptions=('umask=0077')
+ declare -a mountOptions
+ mountpoint=/boot
+ type=filesystem
+ blkid /dev/disk/by-partlabel/disk-main-efiSystemPartiton
+ grep -q TYPE=
+ mkfs.vfat /dev/disk/by-partlabel/disk-main-efiSystemPartiton
mkfs.fat 4.2 (2021-01-31)
+ device=/dev/disk/by-partlabel/disk-main-nixosRoot
+ extraArgs=()
+ declare -a extraArgs
+ format=bcachefs
+ mountOptions=('defaults')
+ declare -a mountOptions
+ mountpoint=/
+ type=filesystem
+ blkid /dev/disk/by-partlabel/disk-main-nixosRoot
+ grep -q TYPE=
+ mkfs.bcachefs /dev/disk/by-partlabel/disk-main-nixosRoot
External UUID: 5ebf454f-1b1c-4c2c-a6c9-feea70714593
Internal UUID: 8f1421a9-dc1a-4301-8edd-9957ed7ceac3
Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index: 0
Label: (none)
Version: 1.20: directory_size
Incompatible features allowed: 1.20: directory_size
Incompatible features in use: 0.0: (unknown version)
Version upgrade complete: 0.0: (unknown version)
Oldest version on disk: 1.20: directory_size
Created: Tue Oct 14 16:34:57 2025
Sequence number: 0
Time of last write: Thu Jan 1 00:00:00 1970
Superblock size: 976 B/1.00 MiB
Clean: 0
Devices: 1
Sections: members_v1,members_v2
Features:
Compat features:
Options:
block_size: 512 B
btree_node_size: 256 KiB
errors: continue [fix_safe] panic ro
write_error_timeout: 30
metadata_replicas: 1
data_replicas: 1
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
checksum_err_retry_nr: 3
compression: none
background_compression: none
str_hash: crc32c crc64 [siphash]
metadata_target: none
foreground_target: none
background_target: none
promote_target: none
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers_bits: 0
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
promote_whole_extents: 1
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
allocator_stuck_timeout: 30
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 160):
Device: 0
Label: (none)
UUID: 35e06f43-c9ed-4be8-bf86-7f33ec28003f
Size: 441 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 441 KiB
First bucket: 0
Buckets: 1048576
Last mount: (never)
Last superblock write: 0
State: rw
Data allowed: journal,btree,user
Has data: (none)
Btree allocated bitmap blocksize: 1.00 B
bcachefs (nvme0n1p2): starting version 1.20: directory_size
bcachefs (nvme0n1p2): initializing new filesystem
Btree allocated bitmap: 0000000000000000000000000000000000000000000000000000000000000000
Durability: 1
Discard: 1
Freespace initialized: 0
+ device=/dev/disk/by-partlabel/disk-main-nixosSwap
+ discardPolicy=
+ extraArgs=()
+ declare -a extraArgs
+ mountOptions=('defaults')
+ declare -a mountOptions
+ priority=
+ randomEncryption=
+ resumeDevice=
+ type=swap
+ blkid /dev/disk/by-partlabel/disk-main-nixosSwap -o export
+ grep -q '^TYPE='
+ mkswap /dev/disk/by-partlabel/disk-main-nixosSwap
Setting up swapspace version 1, size = 24 GiB (25769799680 bytes)
no label, UUID=d5c89a48-3edc-4b97-950b-40c483e03206
+ set -efux
+ destroy=1
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ imageName=main
+ imageSize=2G
+ name=main
+ type=disk
bcachefs (nvme0n1p2): going read-write
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ efiGptPartitionFirst=1
+ type=gpt
+ destroy=1
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ imageName=main
+ imageSize=2G
+ name=main
+ type=disk
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ efiGptPartitionFirst=1
+ type=gpt
+ device=/dev/disk/by-partlabel/disk-main-nixosRoot
+ extraArgs=()
+ declare -a extraArgs
+ format=bcachefs
+ mountOptions=('defaults')
+ declare -a mountOptions
+ mountpoint=/
+ type=filesystem
+ findmnt /dev/disk/by-partlabel/disk-main-nixosRoot /mnt/disko-install-root/
+ mount /dev/disk/by-partlabel/disk-main-nixosRoot /mnt/disko-install-root/ -t bcachefs -o defaults -o X-mount.mkdir
bcachefs (nvme0n1p2): initializing freespace
+ destroy=1
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ imageName=main
+ imageSize=2G
+ name=main
+ type=disk
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ efiGptPartitionFirst=1
+ type=gpt
+ device=/dev/disk/by-partlabel/disk-main-efiSystemPartiton
+ extraArgs=()
+ declare -a extraArgs
+ format=vfat
+ mountOptions=('umask=0077')
+ declare -a mountOptions
+ mountpoint=/boot
+ type=filesystem
+ findmnt /dev/disk/by-partlabel/disk-main-efiSystemPartiton /mnt/disko-install-root/boot
+ mount /dev/disk/by-partlabel/disk-main-efiSystemPartiton /mnt/disko-install-root/boot -t vfat -o umask=0077 -o X-mount.mkdir
+ destroy=1
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ imageName=main
+ imageSize=2G
+ name=main
+ type=disk
+ device=/dev/disk/by-path/pci-0000:02:00.0-nvme-1
+ efiGptPartitionFirst=1
+ type=gpt
+ device=/dev/disk/by-partlabel/disk-main-nixosSwap
+ discardPolicy=
+ extraArgs=()
+ declare -a extraArgs
+ mountOptions=('defaults')
+ declare -a mountOptions
+ priority=
+ randomEncryption=
+ resumeDevice=
+ type=swap
+ test 1 '!=' 1
+ rm -rf /tmp/tmp.JcpTlbs8vt
Is there anything that I can do in order to force it to use the DKMS bcachefs module instead of the in-tree bcachefs module?
I'm not a NixOS expert, but I've heard that with recent packaging changes it picks the module from tools automatically. You can ask for help in the IRC channel if needed, the NixOS bcachefs maintainers are generally there.
The new unattended installation used Linux version 6.14.11 and bcachefs-tools version 1.25.1.
Hooray for deterministic configuration, this clarifies the origin of this issue: the format picking bad bucket sizes was fixed in bcachefs-tools 1.25.2 (released in April), so unfortunately you missed that fix while installing.
In case you want to change the bucket size on your FS to a proper one, you can either reformat the filesystem with recent tools while copying the data manually, or alternatively add another temporary device, evacuate+remove the old one, wipe the old device, add it back (it will pick proper bucket size here) and finally evacuate+remove the temporary device.
That is a lot of btree fragmentation
Would you be able to get me a metadata dump?
Would you be able to get me a metadata dump?
How do I do that?
@koverstreet Why do you think btree fragmentation is too high?
The FS is using 441 KiB buckets, so with 256 KiB btree node size and 39733 buckets used we get (441-256)*39733/1024/1024 ~= 7 GiB of btree fragmentation from unusable bucket tails, which approximately matches the number from fs usage.
Good catch - we should be aligning the bucket size to btree node size and aiming for powers of two; I'm curious if this fs was created when bcachefs-tools was buggy and picking misaligned sizes
I'm curious if this fs was created when bcachefs-tools was buggy and picking misaligned sizes
Yes, as I've said above:
Hooray for deterministic configuration, this clarifies the origin of this issue: the format picking bad bucket sizes was fixed in bcachefs-tools 1.25.2 (released in April), so unfortunately you missed that fix while installing.
The FS was installed with bcachefs-tools version 1.25.1.
The actual bug here is that the filesystem fails to return ENOSPC (and gets stuck in allocator) due to misaccounting of free space with unaligned bucket sizes. Usually copygc would be able to better pack stuff and free up this space, so we do not account it as "used". But due to bad bucket size the "bucket tails" are actually unusable, and copygc cannot do anything about them.
Ok, sorry for not reading enough - this is an old known issue that normally only affects devices with pathologically mismatched sizes (two device filesystem, mismatched device sizes, replicas=2).
@Jayman2000 - you might want to recreate your filesystem if you can, I will bump up the priority on this one but it'll still be a bit before I get to it.