bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

Constant disk writes when idle

Open obj-obj opened this issue 1 year ago • 20 comments
trafficstars

I've been getting constant disk writes of about 30kB/s, even when nothing is writing to the disk. I've had this issue ever since kernel 6.8, and downgrading to 6.7 will solve it (so it seems to be a regression of some kind). Sometimes the issue will completely disappear upon waking up from suspend, only to occur again after another suspend/resume cycle.

Checking /sys/fs/bcachefs/.../dev-0/io_done, I can see the number for btree constantly going up.

Also, sorry I've waited this long to make a bug report. I was busy with schoolwork, so I didn't have time to make one earlier.

obj-obj avatar May 15 '24 03:05 obj-obj

I've been getting constant disk writes of about 30kB/s, even when nothing is writing to the disk. I've had this issue ever since kernel 6.8, and downgrading to 6.7 will solve it (so it seems to be a regression of some kind). Sometimes the issue will completely disappear upon waking up from suspend, only to occur again after another suspend/resume cycle.

Have you checked what processes are writing to disk?

Valmar33 avatar May 15 '24 12:05 Valmar33

I've been getting constant disk writes of about 30kB/s, even when nothing is writing to the disk. I've had this issue ever since kernel 6.8, and downgrading to 6.7 will solve it (so it seems to be a regression of some kind). Sometimes the issue will completely disappear upon waking up from suspend, only to occur again after another suspend/resume cycle.

Have you checked what processes are writing to disk?

There are no processes writing to disk:

Screenshot_20240515_134249

obj-obj avatar May 15 '24 20:05 obj-obj

I'm having the same issue, it keeps constantly writing to journal and btree after copying large files to the drive, creating folders or copying small files don't seem to cause this issue, I'm not sure how large the file needs to be to trigger this, mounting and only reading don't cause this behavior.

gqMPwgL

Arch Linux kernel 6.10.0-rc5-1-mainline and 6.9.7

bargu2 avatar Jun 29 '24 16:06 bargu2

A bit more info, the drive does go back to idle after a while, copying a 1GB file of random data it took a little over 7 minutes for it to go back to idle after the transfer was finished and with a 1GB file of zeros it took only around a minute, this was an external USB spinning drive, I haven't tested on a SSD to see if the behavior is the same, I'm assuming is some checksum calculation going on? But why does it take so long? Is this expected behavior?

Drive info:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.0-rc5-1-mainline] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8U (USB)
Device Model:     ST1000LM025 HN-M101ABB
Serial Number:    E1443G14AA4T2U
LU WWN Device Id: 0 000000 000000000
Firmware Version: 2AR10001
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jun 29 22:36:50 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  25) The self-test routine was aborted by
                                        the host.
Total time to complete Offline 
data collection:                (12900) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 215) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       124
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   086   085   025    Pre-fail  Always       -       4468
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3361
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       427
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       4
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       479
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   057   000    Old_age   Always       -       35 (Min/Max 20/48)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       9
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       4
225 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       15399

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%       392         -
# 2  Extended offline    Aborted by host               90%       392         -
# 3  Short offline       Completed: read failure       90%       392         632428768
# 4  Short offline       Completed: read failure       90%       392         632428768
# 5  Short offline       Completed: read failure       90%       392         632428768

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Aborted_by_host [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Bcachefs info:

Device:                                     M3 Portable     
External UUID:                             3de50eac-2ca5-4a3a-b6cc-2300c9675a6d
Internal UUID:                             823a098c-61e4-4d46-98c6-6a0dc2dc5104
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     
Version:                                   1.7: mi_btree_bitmap
Version upgrade complete:                  1.7: mi_btree_bitmap
Oldest version on disk:                    1.7: mi_btree_bitmap
Created:                                   Sat Jun 29 13:15:56 2024
Sequence number:                           118
Time of last write:                        Sat Jun 29 22:19:04 2024
Superblock size:                           4.63 KiB/1.00 MiB
Clean:                                     0
Devices:                                   1
Sections:                                  members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  zstd,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             zstd
  background_compression:                  zstd
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 160):
Device:                                    0
  Label:                                   bcachefs (0)
  UUID:                                    76f70d62-f088-4a22-9b92-78f4f74af9a5
  Size:                                    932 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             512 KiB
  First bucket:                            0
  Buckets:                                 1907719
  Last mount:                              Sat Jun 29 22:19:04 2024
  Last superblock write:                   118
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        4.00 MiB
  Btree allocated bitmap:                  0000000000000000000000000000000100000000000000011111111111111100
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1

errors (size 8):

bargu2 avatar Jun 29 '24 20:06 bargu2

I have exactly the same issue like @obj-obj after upgarde to kernel 6.10 (mainline) on Arch. If I downgrade back to the kernel 6.9.10 (stable), the disk writes ends after ~4 - 8 minutes.

There is no process who writes data (htop etc.), but I can see that the "journal" and "btree" raises up (like the screenshot above).

cat /sys/fs/bcachefs/.../io_done

The io write is between 30 kb and 100 kb.

I use only one drive with bcachefs on root (compression=lz4:1, discard=1)

What does the fs do in the background?


[Update 14.08.2024] I have a little update of the disk writes behavior. After a long long time ago (~45 min. to ~2h), it ends for a very short time. I think it's strange, because after >1 min. the behavior starts again. It feels like a next round. ;) I think the same happens, if you will create a new folder or a new file.

Furthermore I can feel, that the external SSD will be very warm. But this is normal regarding the disk writes. If I use the same Arch Linux with an downgarded Linux Kernel (exp. 6.9.10) I got the same behavior of disk writes in idle status. But for a more short time.

LunatixDev avatar Jul 21 '24 08:07 LunatixDev

Is there anything more I can do to help debug this issue? I'd rather not have my drive LED blinking all the time lol

obj-obj avatar Aug 13 '24 11:08 obj-obj

Setting journal_flush_disabled to 1 and both data_checksum and metadata_checksum to none in the options seems to fix the constant writes for me. @bargu2, can you try this and see if it works for you as well? It took a while (many hours) for the constant writes to stop for me after those options were set, but they did eventually.

obj-obj avatar Aug 18 '24 06:08 obj-obj

Nevermind, updated to 6.11rc5, rebooted, and it's back

obj-obj avatar Aug 26 '24 02:08 obj-obj

I'm still hitting this bug on kernel 6.12, with the same behavior and cause as explained above (I copied around 430GB on a fresh single-disk filesystem). I'm using lz4:9 foreground compression (none as background). Unmounting and remounting the fs seems to stop the writes, for a short while, until it randomly decides to start again. Would be nice to know what it's doing as the numbers in fs usage don't seem as lively as the ongoing writes in io_done

nagalun avatar Mar 27 '25 13:03 nagalun

6.12 is old, I won't be able to support you there, unfortunately - you'll want to be on at least 6.14.

Which write counter is going up in iodone?

koverstreet avatar Mar 27 '25 13:03 koverstreet

Hi Kent, alright, I'll consider upgrading the kernel. I usually stay with LTS since I thought fixes get backported, but maybe not for bcachefs as it's experimental. In the write section, btree increases around 600k units every 10 seconds, and journal around 45k. The speed at which they increase varies a bit.

nagalun avatar Mar 27 '25 13:03 nagalun

I'm seeing this kind of behavior on Armbian's 6.14.0-rc7.

After a fresh boot:

skandalfo@ranas ~> uname -a
Linux ranas 6.14.0-rc7-edge-rockchip64 #1 SMP PREEMPT Sun Mar 16 22:55:17 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
skandalfo@ranas ~> ls -l /media/bcachefs_data/skandalfo/
total 11380948
-rw-rw-r-- 1 skandalfo skandalfo 2458187776 oct 24 16:05 Fedora-Workstation-Live-x86_64-41-1.4.iso
-rw-rw-r-- 1 skandalfo skandalfo 5665497088 oct  9 15:32 ubuntu-24.10-desktop-amd64.iso
-rw-rw-r-- 1 skandalfo skandalfo 3530403840 oct  9 12:58 ubuntu-24.10-desktop-arm64.iso
skandalfo@ranas ~> cd /media/bcachefs_data/skandalfo/
skandalfo@ranas /m/b/skandalfo> sha1sum *.iso
d0c16e64b43937efda33d30091e21eb5e90cd5a8  Fedora-Workstation-Live-x86_64-41-1.4.iso
16b4d3a82b89b4f1accb1d3fb24af0d81bdb521d  ubuntu-24.10-desktop-amd64.iso
37bd75e444669cc0db514e647317143df90525f5  ubuntu-24.10-desktop-arm64.iso

Once I run the sha1sum command, I get this (with the W/s figure varying 23K -> 95K or so) from glances -1:

DISK I/O                  R/s    W/s 
mmcblk0                     0      - 
mmcblk0boot0                0      -
mmcblk0boot1                0      - 
mtdblock0                   0      - 
nvme0n1                     0      0
nvme0n1p1                   0      0
sda                         0      0 
sdb                         0      0
sdc                         0    94K
sdd                         0    94K
sde                         0      0
sdf                         0      0
zram0                       0      0

Note sdc and sdd (the two SSDs in the filesystem) were quiescent before the read operation.

For more details:

skandalfo@ranas /m/b/skandalfo [1]> sudo bcachefs show-super /dev/sdc
Device:                                     KingFast        
External UUID:                             47d6295f-2089-4839-913d-5f31ad5c5363
Internal UUID:                             b68508e0-52d9-4290-88e1-973c41314aaf
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              1
Label:                                     bcachefs25
Version:                                   1.20: directory_size
Incompatible features allowed:             0.0: (unknown version)
Incompatible features in use:              0.0: (unknown version)
Version upgrade complete:                  1.20: directory_size
Oldest version on disk:                    1.20: directory_size
Created:                                   Sat Mar 15 20:18:46 2025
Sequence number:                           63
Time of last write:                        Wed Mar 26 08:49:39 2025
Superblock size:                           5.34 KiB/1.00 MiB
Clean:                                     0
Devices:                                   4
Sections:                                  members_v1,replicas_v0,disk_groups,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       2
  data_replicas:                           2
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          ssd
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers_bits:                3
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 592):
Device:                                    0
  Label:                                   ssd1 (1)
  UUID:                                    7ea08514-0d0a-463b-bf05-4b0a5a6f6cf2
  Size:                                    224 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 915746
  Last mount:                              Wed Mar 26 08:49:39 2025
  Last superblock write:                   63
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        256 KiB
  Btree allocated bitmap:                  0000000000000010000000000000000000000000000100000000000010000010
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    1
  Label:                                   ssd2 (2)
  UUID:                                    e8c9cedc-3e02-4162-9747-0b9bd59af6d4
  Size:                                    224 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 915746
  Last mount:                              Wed Mar 26 08:49:39 2025
  Last superblock write:                   63
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        128 KiB
  Btree allocated bitmap:                  0000000000000000000000010000000000000000000000001000000000001000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    2
  Label:                                   hdd1 (4)
  UUID:                                    803ec63c-4246-4f02-860d-0f6f47a2004d
  Size:                                    1.82 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 7630916
  Last mount:                              Wed Mar 26 08:49:39 2025
  Last superblock write:                   63
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                user
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    3
  Label:                                   hdd2 (5)
  UUID:                                    bbb8ffed-b216-4248-bf4f-778be0d2e92c
  Size:                                    2.73 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 11446353
  Last mount:                              Wed Mar 26 08:49:39 2025
  Last superblock write:                   63
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                user
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1

errors (size 8):

skandalfo@ranas /m/b/skandalfo> sudo bcachefs fs usage -h /media/bcachefs_data/
Filesystem: 47d6295f-2089-4839-913d-5f31ad5c5363
Size:                       4.59 TiB
Used:                       21.9 GiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/2             2             [sdd sdc]            147 MiB
user:           1/2             2             [sde sdf]           21.7 GiB
cached:         1/1             1             [sdd]               5.10 GiB
cached:         1/1             1             [sdc]               5.75 GiB

Btree usage:
extents:            36.5 MiB
inodes:              512 KiB
dirents:             512 KiB
alloc:              42.0 MiB
subvolumes:          512 KiB
snapshots:           512 KiB
lru:                4.50 MiB
freespace:           512 KiB
need_discard:       1.00 MiB
backpointers:       56.5 MiB
bucket_gens:         512 KiB
snapshot_trees:      512 KiB
deleted_inodes:      512 KiB
logged_ops:          512 KiB
rebalance_work:     1.00 MiB
subvolume_children:  512 KiB
accounting:          512 KiB

hdd.hdd1 (device 2):             sde              rw
                                data         buckets    fragmented
  free:                     1.81 TiB         7585677
  sb:                       3.00 MiB              13       252 KiB
  journal:                   192 MiB             767
  btree:                         0 B               0
  user:                     10.9 GiB           44459       556 KiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 1.82 TiB         7630916

hdd.hdd2 (device 3):             sdf              rw
                                data         buckets    fragmented
  free:                     2.72 TiB        11401114
  sb:                       3.00 MiB              13       252 KiB
  journal:                   192 MiB             767
  btree:                         0 B               0
  user:                     10.9 GiB           44459       556 KiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 2.73 TiB        11446353

ssd.ssd1 (device 0):             sdd              rw
                                data         buckets    fragmented
  free:                      218 GiB          893777
  sb:                       3.00 MiB              13       252 KiB
  journal:                   192 MiB             767
  btree:                    73.5 MiB             294
  user:                          0 B               0
  cached:                   5.10 GiB           20895       192 KiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                  224 GiB          915746

ssd.ssd2 (device 1):             sdc              rw
                                data         buckets    fragmented
  free:                      218 GiB          891108
  sb:                       3.00 MiB              13       252 KiB
  journal:                   192 MiB             767
  btree:                    73.5 MiB             294
  user:                          0 B               0
  cached:                   5.75 GiB           23564       364 KiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                  224 GiB          915746

The numbers going up in

skandalfo@ranas /m/b/skandalfo> watch cat /sys/fs/bcachefs/47d6295f-2089-4839-913d-5f31ad5c5363/*/io_done

are the write counters for journal and btree.

Notice that I can reach this state also by just reformatting the set and creating the big files, either by downloading them (as I did here) or by just running dd from /dev/zero or /dev/urandom for ~4 GiB files (bs=4M count=1k).

Only if I reboot it goes quiescent. Just automounting the filesystem on boot is OK. But reading the files (no writing required) restarts the behavior.

skandalfo avatar Mar 27 '25 17:03 skandalfo

In the meantime the Armbian rolling release has moved on to 6.14.0 proper:

skandalfo@ranas /m/b/skandalfo> uname -a
Linux ranas 6.14.0-edge-rockchip64 #1 SMP PREEMPT Mon Mar 24 17:39:21 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Which IINM shouldn't have significant differences from -rc7. But the fact is that I can reproduce the same behavior as above with the sha1sum command in 6.14.0 too.

skandalfo avatar Mar 27 '25 17:03 skandalfo

Yes, I confirm I have updated the kernel to 6.14 and can see the same behavior on a brand new fs (NOT on the upgraded one I had already written on 6.12). According to py1hon on IRC, the filesystem is flushing dirty metadata, and a change is pending to make it flush faster if the disk is idle. However, it's true that I'm currently at around 2 hours since it started constantly writing at 30kb/s and it still hasn't finished, which seems a bit odd to me, since it seems to go away quicker if you remount or reboot. While trying to replicate I hit other unrelated issues with lz4hc and a bch-reclaim thread deadlock which were also reported and dealt with surprisingly quickly.

nagalun avatar Mar 27 '25 19:03 nagalun

I noticed same behavior. The host has a version 6.14 kernel installed with patches from the bcachefs master branch (commit 4594600).

# iostat -tmxd sda sdb 3
Linux 6.14.0-dist-hardened 	28.03.2025 	_x86_64_	(8 CPU)

28.03.2025 07:23:04
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda             87,99      4,10     4,44   4,81    1,97    47,72    5,09      0,28     0,16   3,14    0,81    56,73    0,67      0,75     0,00   0,00    0,62  1140,46    0,85    1,23    0,18   0,93
sdb             73,24      3,30     8,22  10,09    3,68    46,10    9,20      0,30     0,06   0,60    4,54    33,62    0,00      0,00     0,00   0,00    0,00     0,00    0,85    7,93    0,32   2,61


28.03.2025 07:23:07
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    6,33      0,07     0,00   0,00    0,16    10,95    0,00      0,00     0,00   0,00    0,00     0,00    1,33    0,50    0,00   0,13
sdb              0,00      0,00     0,00   0,00    0,00     0,00    6,33      0,07     0,00   0,00    3,68    10,95    0,00      0,00     0,00   0,00    0,00     0,00    1,33   12,00    0,04   1,57


28.03.2025 07:23:10
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    7,00      0,02     0,00   0,00    0,19     3,43    0,00      0,00     0,00   0,00    0,00     0,00    2,00    0,33    0,00   0,00
sdb              0,00      0,00     0,00   0,00    0,00     0,00    7,00      0,02     0,00   0,00    3,00     3,43    0,00      0,00     0,00   0,00    0,00     0,00    2,00    8,00    0,04   1,43


28.03.2025 07:23:13
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    6,67      0,02     0,00   0,00    0,35     3,40    0,00      0,00     0,00   0,00    0,00     0,00    2,00    0,83    0,00   0,10
sdb              0,00      0,00     0,00   0,00    0,00     0,00    6,67      0,02     0,00   0,00    2,15     3,40    0,00      0,00     0,00   0,00    0,00     0,00    2,00    6,00    0,03   1,10


28.03.2025 07:23:16
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    5,67      0,02     0,00   0,00    0,24     3,29    0,00      0,00     0,00   0,00    0,00     0,00    2,00    0,50    0,00   0,10
sdb              0,00      0,00     0,00   0,00    0,00     0,00    5,67      0,02     0,00   0,00    2,53     3,29    0,00      0,00     0,00   0,00    0,00     0,00    2,00    5,83    0,03   1,13


28.03.2025 07:23:19
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    6,33      0,02     0,00   0,00    0,21     3,37    0,00      0,00     0,00   0,00    0,00     0,00    2,00    0,33    0,00   0,23
sdb              0,00      0,00     0,00   0,00    0,00     0,00    6,33      0,02     0,00   0,00    1,37     3,37    0,00      0,00     0,00   0,00    0,00     0,00    2,00    3,67    0,02   0,53


28.03.2025 07:23:22
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0,00      0,00     0,00   0,00    0,00     0,00    5,67      0,02     0,00   0,00    0,24     3,53    0,00      0,00     0,00   0,00    0,00     0,00    2,00    0,50    0,00   0,00
sdb              0,00      0,00     0,00   0,00    0,00     0,00    5,67      0,02     0,00   0,00    3,00     3,53    0,00      0,00     0,00   0,00    0,00     0,00    2,00    7,50    0,03   1,47


# bcachefs fs top|grep -v ' 0$'
btree_node_write                                 5
journal_reclaim_finish                           18
journal_reclaim_start                            18
journal_write                                    2
transaction_commit                               10

btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
btree_cache_cannibalize_lock                     1
btree_cache_cannibalize_unlock                   1
btree_node_write                                 6
btree_node_compact                               1
btree_node_alloc                                 1
btree_node_free                                  2
journal_reclaim_finish                           18
journal_reclaim_start                            18

transaction_commit                               11
btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
io_write                                         8
btree_node_write                                 5
journal_reclaim_finish                           18
journal_reclaim_start                            18
journal_write                                    2

transaction_commit                               11
btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
btree_node_write                                 5
journal_reclaim_finish                           20
journal_reclaim_start                            20
journal_write                                    2

transaction_commit                               10
btree_node_write                                 4
journal_reclaim_finish                           18
journal_reclaim_start                            18
journal_write                                    2

transaction_commit                               9
bucket_alloc                                     1
btree_cache_cannibalize_lock                     1
btree_cache_cannibalize_unlock                   1
btree_node_write                                 7
btree_node_compact                               1
btree_node_alloc                                 1
btree_node_free                                  2

alexminder avatar Mar 28 '25 04:03 alexminder

I can confirm this still happens for me on 6.14.0. Strangely, the random writes completely stopped for a day, but they came back the next time I rebooted.

obj-obj avatar Apr 06 '25 20:04 obj-obj

how constant are they?

koverstreet avatar Apr 06 '25 22:04 koverstreet

how constant are they?

The disk write indicator on my case is always blinking, and iotop reports ~5-10kB/s being written to disk when no processes are writing anything

obj-obj avatar Apr 06 '25 23:04 obj-obj

Yeah that's just journal reclaim. It's not a cause for concern, just some somewhat suboptimal behaviour. I have a design doc on how to fix it, if you want to write code :)

On Sun, Apr 6, 2025, 6:29 PM [object Object] @.***> wrote:

how constant are they?

The disk write indicator on my case is always blinking, and iotop reports ~5-10kB/s being written to disk when no processes are writing anything

— Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/678#issuecomment-2781725556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPGX3SSDKJCKEP566NJC6D2YG2FXAVCNFSM6AAAAABZ5AC2OOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBRG4ZDKNJVGY . You are receiving this because you commented.Message ID: @.***> [image: obj-obj]obj-obj left a comment (koverstreet/bcachefs#678) https://github.com/koverstreet/bcachefs/issues/678#issuecomment-2781725556

how constant are they?

The disk write indicator on my case is always blinking, and iotop reports ~5-10kB/s being written to disk when no processes are writing anything

— Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/678#issuecomment-2781725556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPGX3SSDKJCKEP566NJC6D2YG2FXAVCNFSM6AAAAABZ5AC2OOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBRG4ZDKNJVGY . You are receiving this because you commented.Message ID: @.***>

koverstreet avatar Apr 07 '25 02:04 koverstreet

I have a design doc on how to fix it, if you want to write code :)

I'd be interested in reading that doc.

I don't have a lot of BW these days, and I'd need to find out how you compile and boot your own kernel in Armbian now, but I might be able to get to work on it eventually.

skandalfo avatar Apr 07 '25 09:04 skandalfo