zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Cache-unfriendly filesystem usage, memory fragmentation and ARC

Open runderwo opened this issue 11 months ago • 7 comments

System information

Type Version/Name
Distribution Name Ubuntu/Debian
Distribution Version LTS/latest
Kernel Version 6.8-6.11
Architecture x86_64
OpenZFS Version 2.2.2-2.2.6

Describe the problem you're observing

After moderate uptime of a few weeks, when a program tries to read or index the whole filesystem or a large chunk of it, the system seizes up, becomes unresponsive to input/network for 15-20 minutes. Eventually it recovers to a sluggish but usable state (with the offending process still running, consuming core time and disk I/O) where a tool like atop can be used to observe lingering heavy free page scan activity, - despite up to 10GiB of free/avail memory! (Linux page cache has been zeroed by this time.)

ARC is maxed out at 97% (almost 50% of system RAM according to the default settings).

Examining /proc/buddyinfo, there are no free pages >= 1MiB in the steady state and can be even worse right after the "seizure" with no free pages >= 128KiB.

I suspect the partial recovery is thanks to kcompactd activity. I am thinking that ZFS should drop cached file blocks from ARC not just when the kernel low watermark is reached, but also when higher order free pages become exhausted.

Describe how to reproduce the problem

Simulate normal memory fragmentation on a host, including multiple hibernate/resume cycles, then run duplicity, tracker3-miner, or similar programs which ingest the whole filesystem in a cache-unfriendly and ZFS-unfriendly way while monitoring the situation with atop.

Include any warning/errors/backtraces from the system logs

Dec 22 16:09:02 desktop kernel: zfs: module license 'CDDL' taints kernel.
Dec 22 16:09:02 desktop kernel: Disabling lock debugging due to kernel taint
Dec 22 16:09:02 desktop kernel: zfs: module license taints kernel.
Dec 22 16:09:02 desktop kernel: calling  openzfs_init+0x0/0xce0 [zfs] @ 428
Dec 22 16:09:02 desktop kernel: ZFS: Loaded module v2.2.2-0ubuntu9.1, ZFS pool version 5000, ZFS filesystem version 5
[..]
Jan 22 14:05:06 desktop systemd-journald[935]: Under memory pressure, flushing caches.
[..]
Jan 22 14:16:31 desktop kernel: INFO: task chrome:3547537 blocked for more than 122 seconds.
Jan 22 14:16:31 desktop kernel:       Tainted: P           OE      6.8.0-51-generic #52-Ubuntu
Jan 22 14:16:31 desktop kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.```
[etc, etc]

runderwo avatar Jan 22 '25 21:01 runderwo

Have you tried to set zfs_arc_shrinker_limit=0 after updating to the latest 2.2.x release? It was made default in 2.3.0: https://github.com/openzfs/zfs/pull/16909 .

amotin avatar Jan 23 '25 00:01 amotin

I have not, but will try that out and report back after it's had some time to bake on a host with extended uptime. Thanks for the pointer!

runderwo avatar Jan 23 '25 00:01 runderwo

Hi,

I have been suffering from the exact same issue ever since upgrading to zfs 2.2.0. I'd like to add a few things:

  • When indexing starts, the system seems normal at first but then memory consumption spikes rapidly. My server has about 64GB of ram and looking at htop while running the operation, memory usage will spike suddenly and rapidly. It takes about 3 seconds and the system is entirely OOM and unresponsive.
  • Terminating the process that indexes the filesystem before the system becomes unresponsive helps but the used memory does not seem to get freed anymore. A reboot is required to reclaim said memory.
  • There is a similar issue with filesystem watches - except it takes days to a week or two to trigger instead of seconds. Might be unrelated but I thought I'd mention it.

I just reproduced it and had a watch on both arc_summary (notice the nice "Available memory size") and /proc/slabinfo running in parallel.

Note that I copied the frozen terminal output into a text file since the system was no longer responsive. I hope it contains the relevant information you might be looking for. Otherwise I can try to extract some information on a partially bricked system where indexing has been killed before the system becomes unresponsive.

arc_summary.txt slabinfo.txt slabinfo after reboot

zfs_arc_shrinker_limit is already 0 (I double checked) on my setup as I am running zfs 2.3.0 right now, Kernel 6.12.8. zfs_arc_max does not seem to really matter here. I have tried values ranging from 16GB to 60GB (set via modprobe on boot, not during runtime) and it happens no matter what.

~Another tidbit: I had htop running and while the green bar (used memory) was not at 100% (not even a third), the yellow one (cache) basically filled up the empty space. This would explain runderwos observation of there being "free memory" if memory used for caches was not taken into account.~ Edit: It was mentioned, I evidently just can't read, sorry

The issue is reproducible moments after the system has rebooted for me so no causing "memory fragmentation" required at all. If I can provide or try anything else, lmk.

XyFreak avatar Jan 24 '25 22:01 XyFreak

@XyFreak In the arc_summary.txt I see that your ARC freed everything else it could at that point, but there appears 36GB that it can not free for some reason (supposedly referenced by something):

        MRU data size:                                 99.5 %   36.5 GiB
        MRU evictable data size:                        0.0 %    0 Bytes

"since upgrading to zfs 2.2.0" is a long time of about a year. Since it is not a widely noticed issue, there must be something specific in your case that triggers it. It could help to find out what exactly is your "indexing", how does it access the files and how to reproduce it in minimal environment.

amotin avatar Jan 26 '25 15:01 amotin

I am still seeing the same behavior with zfs_arc_shrinker_limit=0. Extreme memory fragmentation grinds the system to a halt with free page scans despite multiple gigabytes of "free" memory.

runderwo avatar Apr 09 '25 16:04 runderwo

Is it possible that failure to hibernate after a significant uptime is related? This is on a laptop with 48GB RAM and 128GB hibernation/swapfile.

Mar 24 14:54:37 achpee2 kernel: Filesystems sync: 4.641 seconds
Mar 24 14:54:48 achpee2 kernel: Freezing user space processes
Mar 24 14:54:48 achpee2 kernel: Freezing user space processes completed (elapsed 0.002 seconds)
Mar 24 14:54:48 achpee2 kernel: OOM killer disabled.
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x00058000-0x00058fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x00086000-0x000fffff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x55747000-0x55747fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x55757000-0x55758fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x55771000-0x55771fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x590ba000-0x590e4fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x60180000-0x601a2fff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x6f38e000-0x6fffdfff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Marking nosave pages: [mem 0x6ffff000-0xffffffff]
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Basic memory bitmaps created
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Preallocating image memory
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Image allocation is 849661 pages short
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: Basic memory bitmaps freed
Mar 24 14:54:48 achpee2 kernel: OOM killer enabled.
Mar 24 14:54:48 achpee2 kernel: Restarting tasks ... done.
Mar 24 14:54:48 achpee2 kernel: PM: hibernation: hibernation exit

runderwo avatar Apr 26 '25 09:04 runderwo

Saw the same "spinning in free page scan" meltdown again today. Afterwards significant fragmentation can be observed:

$ cat /proc/buddyinfo 
Node 0, zone      DMA      1      1      1      1      1      1      1      1      0      1      2 
Node 0, zone    DMA32   1610    842    863    674    377    234    162    103     57     29    108 
Node 0, zone   Normal   2057  15383  18048   5022  30170   3737   5160   1160    203      0      0 

runderwo avatar May 30 '25 15:05 runderwo

Happened again today. I have an atop log capturing the situation at 30 second intervals. Mem fragmentation is also terrible again:

Node 0, zone      DMA      1      1      1      1      1      1      1      1      0      1      2 
Node 0, zone    DMA32   1244    965    893    878    748   1616    136      5      0      0      0 
Node 0, zone   Normal  15091  68878  68007  72113  34303   4715    161      0      0      0      0 

I have upgraded to zfs 2.3.2 to see if the situation changes.

runderwo avatar Jun 17 '25 22:06 runderwo

@runderwo ZFS allocates ARC in as big chunks as kernel allows. The more memory is fragmented, the smaller are new allocations, and potentially higher fragmentation is future. ARC memory is not movable, but evictable. It means kernel can not defragment it by moving pages around, but it can always request ZFS to evict something, and ZFS should obey the requests. If there are evidences that ZFS ignored some eviction requests, then we need to see them. Otherwise it is just a fact of life.

amotin avatar Jun 17 '25 23:06 amotin

@amotin I am precisely trying to help discover "evidence that ZFS ignored some eviction requests" for ARC memory. The unusability of the system while it grinds through page scans for up to an hour at a time, as well as the lingering amount of fragmentation on a system with completely normal use and 10GB free memory is simply unacceptable; it's difficult to imagine any userspace pattern causing it, so it must be a problem between ZFS (the heaviest slab consumer by far) and the kernel until proven otherwise. Do you have any suggestions for further evidence gathering?

runderwo avatar Jun 18 '25 00:06 runderwo

I'm seeing something similar - I can somewhat reliably trigger it - usually when I have a program scan many files over the filesystem (spoiler - it's Plex/Jellyfin), I experience heavy memory contention and system locks - I've also seen it doing things like backups (with Borg) - I've already looked at this issue, #17424, #14686, #17052, #16325, and #16322 - and they all seem to have similar symptoms to me, but maybe different root causes.

Sadly, it's hard to get a grasp as to what my ARC stats look like during a lockup, because the system, is well, locked-up - it's memory contention that follows with heavy CPU thrashing. Eventually, after 20 minutes to an hour, things recover - but a full reboot of the system is usually needed because the system is typically in a wonky state.

Here's an arc_summary -a right after I import my pool.

------------------------------------------------------------------------
ZFS Subsystem Report                            Thu Oct 30 18:18:53 2025
Linux 6.16.10-arch1-1                                            2.3.4-1
Machine: 302DerikCourt (x86_64)                                  2.3.4-1

ARC status:
        Total memory size:                                     125.5 GiB
        Min target size:                                3.1 %    3.9 GiB
        Max target size:                               12.7 %   16.0 GiB
        Target size (adaptive):                         0.3 %    3.9 GiB
        Current size:                                   0.3 %   53.7 MiB
        Free memory size:                                      123.8 GiB
        Available memory size:                                 119.5 GiB

ARC structural breakdown (current size):                        53.7 MiB
        Compressed size:                               14.4 %    7.7 MiB
        Overhead size:                                 59.0 %   31.6 MiB
        Bonus size:                                     4.8 %    2.6 MiB
        Dnode size:                                    14.5 %    7.8 MiB
        Dbuf size:                                      6.5 %    3.5 MiB
        Header size:                                    0.8 %  421.5 KiB
        L2 header size:                                 0.0 %    0 Bytes
        ABD chunk waste size:                           0.1 %   28.5 KiB

ARC types breakdown (compressed + overhead):                    39.4 MiB
        Data size:                                      0.0 %    0 Bytes
        Metadata size:                                100.0 %   39.4 MiB

ARC states breakdown (compressed + overhead):                   39.4 MiB
        Anonymous data size:                            0.0 %    0 Bytes
        Anonymous metadata size:                        0.0 %    0 Bytes
        MFU data target:                               37.5 %   14.8 MiB
        MFU data size:                                  0.0 %    0 Bytes
        MFU evictable data size:                        0.0 %    0 Bytes
        MFU ghost data size:                                     0 Bytes
        MFU metadata target:                           12.5 %    4.9 MiB
        MFU metadata size:                             49.8 %   19.6 MiB
        MFU evictable metadata size:                    0.2 %   76.0 KiB
        MFU ghost metadata size:                                 0 Bytes
        MRU data target:                               37.5 %   14.8 MiB
        MRU data size:                                  0.0 %    0 Bytes
        MRU evictable data size:                        0.0 %    0 Bytes
        MRU ghost data size:                                     0 Bytes
        MRU metadata target:                           12.5 %    4.9 MiB
        MRU metadata size:                             50.2 %   19.7 MiB
        MRU evictable metadata size:                    1.4 %  572.0 KiB
        MRU ghost metadata size:                                 0 Bytes
        Uncached data size:                             0.0 %    0 Bytes
        Uncached metadata size:                         0.0 %    0 Bytes

ARC hash breakdown:
        Elements:                                                   1.6k
        Collisions:                                                    0
        Chain max:                                                     0
        Chains:                                                        0

ARC misc:
        Uncompressed size:                            444.2 %   34.3 MiB
        Memory throttles:                                              0
        Memory direct reclaims:                                        0
        Memory indirect reclaims:                                      0
        Deleted:                                                     145
        Mutex misses:                                                  0
        Eviction skips:                                                3
        Eviction skips due to L2 writes:                               0
        L2 cached evictions:                                     0 Bytes
        L2 eligible evictions:                                 384.0 KiB
        L2 eligible MFU evictions:                     29.2 %  112.0 KiB
        L2 eligible MRU evictions:                     70.8 %  272.0 KiB
        L2 ineligible evictions:                                 4.0 KiB

ARC total accesses:                                               132.5k
        Total hits:                                    98.7 %     130.7k
        Total I/O hits:                               < 0.1 %         55
        Total misses:                                   1.3 %       1.7k

ARC demand data accesses:                               0.0 %          0
        Demand data hits:                                 n/a          0
        Demand data I/O hits:                             n/a          0
        Demand data misses:                               n/a          0

ARC demand metadata accesses:                          99.7 %     132.1k
        Demand metadata hits:                          98.9 %     130.7k
        Demand metadata I/O hits:                     < 0.1 %         51
        Demand metadata misses:                         1.1 %       1.4k

ARC prefetch data accesses:                             0.0 %          0
        Prefetch data hits:                               n/a          0
        Prefetch data I/O hits:                           n/a          0
        Prefetch data misses:                             n/a          0

ARC prefetch metadata accesses:                         0.3 %        359
        Prefetch metadata hits:                        10.6 %         38
        Prefetch metadata I/O hits:                     1.1 %          4
        Prefetch metadata misses:                      88.3 %        317

ARC predictive prefetches:                             86.9 %        312
        Demand hits after predictive:                  85.3 %        266
        Demand I/O hits after predictive:              10.3 %         32
        Never demanded after predictive:                4.5 %         14

ARC prescient prefetches:                              13.1 %         47
        Demand hits after prescient:                   83.0 %         39
        Demand I/O hits after prescient:               17.0 %          8
        Never demanded after prescient:                 0.0 %          0

ARC states hits of all accesses:
        Most frequently used (MFU):                    64.8 %      85.8k
        Most recently used (MRU):                      33.9 %      44.9k
        Most frequently used (MFU) ghost:               0.0 %          0
        Most recently used (MRU) ghost:                 0.0 %          0
        Uncached:                                       0.0 %          0

DMU predictive prefetcher calls:                                   16.9k
        Stream hits:                                   97.7 %      16.6k
        Hits ahead of stream:                           0.1 %         21
        Hits behind stream:                             0.5 %         91
        Stream misses:                                  1.6 %        278
        Streams limit reached:                          0.0 %          0
        Stream strides:                                                0
        Prefetches issued                                            258

L2ARC not detected, skipping section

Solaris Porting Layer (SPL):
        spl_hostid=0
        spl_hostid_path=/etc/hostid
        spl_kmem_alloc_max=1048576
        spl_kmem_alloc_warn=65536
        spl_kmem_cache_kmem_threads=4
        spl_kmem_cache_magazine_size=0
        spl_kmem_cache_max_size=32
        spl_kmem_cache_obj_per_slab=8
        spl_kmem_cache_slab_limit=16384
        spl_panic_halt=0
        spl_schedule_hrtimeout_slack_us=0
        spl_taskq_kick=0
        spl_taskq_thread_bind=0
        spl_taskq_thread_dynamic=1
        spl_taskq_thread_priority=1
        spl_taskq_thread_sequential=4
        spl_taskq_thread_timeout_ms=5000

Tunables:
        brt_zap_default_bs=12
        brt_zap_default_ibs=12
        brt_zap_prefetch=1
        dbuf_cache_hiwater_pct=10
        dbuf_cache_lowater_pct=10
        dbuf_cache_max_bytes=18446744073709551615
        dbuf_cache_shift=5
        dbuf_metadata_cache_max_bytes=18446744073709551615
        dbuf_metadata_cache_shift=6
        dbuf_mutex_cache_shift=0
        ddt_zap_default_bs=15
        ddt_zap_default_ibs=15
        dmu_ddt_copies=0
        dmu_object_alloc_chunk_shift=7
        dmu_prefetch_max=134217728
        icp_aes_impl=cycle [fastest] generic x86_64 aesni
        icp_gcm_avx_chunk_size=32736
        icp_gcm_impl=cycle [fastest] avx generic pclmulqdq
        l2arc_exclude_special=0
        l2arc_feed_again=1
        l2arc_feed_min_ms=200
        l2arc_feed_secs=1
        l2arc_headroom=8
        l2arc_headroom_boost=200
        l2arc_meta_percent=33
        l2arc_mfuonly=0
        l2arc_noprefetch=1
        l2arc_norw=0
        l2arc_rebuild_blocks_min_l2size=1073741824
        l2arc_rebuild_enabled=1
        l2arc_trim_ahead=0
        l2arc_write_boost=33554432
        l2arc_write_max=33554432
        metaslab_aliquot=1048576
        metaslab_bias_enabled=1
        metaslab_debug_load=0
        metaslab_debug_unload=0
        metaslab_df_max_search=16777216
        metaslab_df_use_largest_segment=0
        metaslab_force_ganging=16777217
        metaslab_force_ganging_pct=3
        metaslab_fragmentation_factor_enabled=1
        metaslab_lba_weighting_enabled=1
        metaslab_preload_enabled=1
        metaslab_preload_limit=10
        metaslab_preload_pct=50
        metaslab_unload_delay=32
        metaslab_unload_delay_ms=600000
        raidz_expand_max_copy_bytes=167772160
        raidz_expand_max_reflow_bytes=0
        raidz_io_aggregate_rows=4
        send_holes_without_birth_time=1
        spa_asize_inflation=24
        spa_config_path=/etc/zfs/zpool.cache
        spa_cpus_per_allocator=4
        spa_load_print_vdev_tree=0
        spa_load_verify_data=1
        spa_load_verify_metadata=1
        spa_load_verify_shift=4
        spa_num_allocators=4
        spa_slop_shift=5
        spa_upgrade_errlog_limit=0
        vdev_file_logical_ashift=9
        vdev_file_physical_ashift=9
        vdev_removal_max_span=32768
        vdev_validate_skip=0
        zap_iterate_prefetch=1
        zap_micro_max_size=131072
        zap_shrink_enabled=1
        zfetch_hole_shift=2
        zfetch_max_distance=67108864
        zfetch_max_idistance=134217728
        zfetch_max_reorder=16777216
        zfetch_max_sec_reap=2
        zfetch_max_streams=8
        zfetch_min_distance=4194304
        zfetch_min_sec_reap=1
        zfs_abd_scatter_enabled=1
        zfs_abd_scatter_max_order=9
        zfs_abd_scatter_min_size=1536
        zfs_active_allocator=dynamic
        zfs_admin_snapshot=0
        zfs_allow_redacted_dataset_mount=0
        zfs_arc_average_blocksize=8192
        zfs_arc_dnode_limit=0
        zfs_arc_dnode_limit_percent=10
        zfs_arc_dnode_reduce_percent=10
        zfs_arc_evict_batch_limit=10
        zfs_arc_evict_threads=4
        zfs_arc_eviction_pct=200
        zfs_arc_grow_retry=0
        zfs_arc_lotsfree_percent=10
        zfs_arc_max=17179869184
        zfs_arc_meta_balance=500
        zfs_arc_min=0
        zfs_arc_min_prefetch_ms=0
        zfs_arc_min_prescient_prefetch_ms=0
        zfs_arc_pc_percent=0
        zfs_arc_prune_task_threads=1
        zfs_arc_shrink_shift=0
        zfs_arc_shrinker_limit=0
        zfs_arc_shrinker_seeks=2
        zfs_arc_sys_free=0
        zfs_async_block_max_blocks=18446744073709551615
        zfs_autoimport_disable=1
        zfs_bclone_enabled=1
        zfs_bclone_wait_dirty=1
        zfs_blake3_impl=cycle [fastest] generic sse2 sse41 avx2
        zfs_btree_verify_intensity=0
        zfs_checksum_events_per_second=20
        zfs_commit_timeout_pct=10
        zfs_compressed_arc_enabled=1
        zfs_condense_indirect_commit_entry_delay_ms=0
        zfs_condense_indirect_obsolete_pct=25
        zfs_condense_indirect_vdevs_enable=1
        zfs_condense_max_obsolete_bytes=1073741824
        zfs_condense_min_mapping_bytes=131072
        zfs_dbgmsg_enable=1
        zfs_dbgmsg_maxsize=4194304
        zfs_dbuf_state_index=0
        zfs_ddt_data_is_special=1
        zfs_deadman_checktime_ms=60000
        zfs_deadman_enabled=1
        zfs_deadman_events_per_second=1
        zfs_deadman_failmode=wait
        zfs_deadman_synctime_ms=600000
        zfs_deadman_ziotime_ms=300000
        zfs_dedup_log_cap=4294967295
        zfs_dedup_log_flush_entries_max=4294967295
        zfs_dedup_log_flush_entries_min=200
        zfs_dedup_log_flush_flow_rate_txgs=10
        zfs_dedup_log_flush_min_time_ms=1000
        zfs_dedup_log_flush_txgs=100
        zfs_dedup_log_hard_cap=0
        zfs_dedup_log_mem_max=1347660922
        zfs_dedup_log_mem_max_percent=1
        zfs_dedup_log_txg_max=8
        zfs_dedup_prefetch=0
        zfs_default_bs=9
        zfs_default_ibs=17
        zfs_delay_min_dirty_percent=60
        zfs_delay_scale=500000
        zfs_delete_blocks=20480
        zfs_dio_enabled=1
        zfs_dio_write_verify_events_per_second=20
        zfs_dirty_data_max=4294967296
        zfs_dirty_data_max_max=4294967296
        zfs_dirty_data_max_max_percent=25
        zfs_dirty_data_max_percent=10
        zfs_dirty_data_sync_percent=20
        zfs_disable_ivset_guid_check=0
        zfs_dmu_offset_next_sync=1
        zfs_embedded_slog_min_ms=64
        zfs_expire_snapshot=300
        zfs_fallocate_reserve_percent=110
        zfs_flags=0
        zfs_fletcher_4_impl=[fastest] scalar superscalar superscalar4 sse2 ssse3 avx2
        zfs_free_bpobj_enabled=1
        zfs_free_leak_on_eio=0
        zfs_free_min_time_ms=1000
        zfs_history_output_max=1048576
        zfs_immediate_write_sz=32768
        zfs_initialize_chunk_size=1048576
        zfs_initialize_value=16045690984833335022
        zfs_keep_log_spacemaps_at_export=0
        zfs_key_max_salt_uses=400000000
        zfs_livelist_condense_new_alloc=0
        zfs_livelist_condense_sync_cancel=0
        zfs_livelist_condense_sync_pause=0
        zfs_livelist_condense_zthr_cancel=0
        zfs_livelist_condense_zthr_pause=0
        zfs_livelist_max_entries=500000
        zfs_livelist_min_percent_shared=75
        zfs_lua_max_instrlimit=100000000
        zfs_lua_max_memlimit=104857600
        zfs_max_async_dedup_frees=100000
        zfs_max_dataset_nesting=50
        zfs_max_log_walking=5
        zfs_max_logsm_summary_length=10
        zfs_max_missing_tvds=0
        zfs_max_nvlist_src_size=0
        zfs_max_recordsize=16777216
        zfs_metaslab_find_max_tries=100
        zfs_metaslab_fragmentation_threshold=77
        zfs_metaslab_max_size_cache_sec=3600
        zfs_metaslab_mem_limit=25
        zfs_metaslab_segment_weight_enabled=1
        zfs_metaslab_switch_threshold=2
        zfs_metaslab_try_hard_before_gang=0
        zfs_mg_fragmentation_threshold=95
        zfs_mg_noalloc_threshold=0
        zfs_min_metaslabs_to_flush=1
        zfs_multihost_fail_intervals=10
        zfs_multihost_history=0
        zfs_multihost_import_intervals=20
        zfs_multihost_interval=1000
        zfs_multilist_num_sublists=0
        zfs_no_scrub_io=0
        zfs_no_scrub_prefetch=0
        zfs_nocacheflush=0
        zfs_nopwrite_enabled=1
        zfs_object_mutex_size=64
        zfs_obsolete_min_time_ms=500
        zfs_override_estimate_recordsize=0
        zfs_pd_bytes_max=52428800
        zfs_per_txg_dirty_frees_percent=30
        zfs_prefetch_disable=0
        zfs_read_history=0
        zfs_read_history_hits=0
        zfs_rebuild_max_segment=1048576
        zfs_rebuild_scrub_enabled=1
        zfs_rebuild_vdev_limit=67108864
        zfs_reconstruct_indirect_combinations_max=4096
        zfs_recover=0
        zfs_recv_best_effort_corrective=0
        zfs_recv_queue_ff=20
        zfs_recv_queue_length=16777216
        zfs_recv_write_batch_size=1048576
        zfs_removal_ignore_errors=0
        zfs_removal_suspend_progress=0
        zfs_remove_max_segment=16777216
        zfs_resilver_defer_percent=10
        zfs_resilver_disable_defer=0
        zfs_resilver_min_time_ms=3000
        zfs_scan_blkstats=0
        zfs_scan_checkpoint_intval=7200
        zfs_scan_fill_weight=3
        zfs_scan_ignore_errors=0
        zfs_scan_issue_strategy=0
        zfs_scan_legacy=0
        zfs_scan_max_ext_gap=2097152
        zfs_scan_mem_lim_fact=20
        zfs_scan_mem_lim_soft_fact=20
        zfs_scan_report_txgs=0
        zfs_scan_strict_mem_lim=0
        zfs_scan_suspend_progress=0
        zfs_scan_vdev_limit=16777216
        zfs_scrub_after_expand=1
        zfs_scrub_error_blocks_per_txg=4096
        zfs_scrub_min_time_ms=1000
        zfs_send_corrupt_data=0
        zfs_send_no_prefetch_queue_ff=20
        zfs_send_no_prefetch_queue_length=1048576
        zfs_send_queue_ff=20
        zfs_send_queue_length=16777216
        zfs_send_unmodified_spill_blocks=1
        zfs_sha256_impl=cycle [fastest] generic x64 ssse3 avx avx2 shani
        zfs_sha512_impl=cycle [fastest] generic x64 avx avx2
        zfs_slow_io_events_per_second=20
        zfs_snapshot_history_enabled=1
        zfs_snapshot_no_setuid=0
        zfs_spa_discard_memory_limit=16777216
        zfs_special_class_metadata_reserve_pct=25
        zfs_sync_pass_deferred_free=2
        zfs_sync_pass_dont_compress=8
        zfs_sync_pass_rewrite=2
        zfs_traverse_indirect_prefetch_limit=32
        zfs_trim_extent_bytes_max=134217728
        zfs_trim_extent_bytes_min=32768
        zfs_trim_metaslab_skip=0
        zfs_trim_queue_limit=10
        zfs_trim_txg_batch=32
        zfs_txg_history=100
        zfs_txg_timeout=5
        zfs_unflushed_log_block_max=131072
        zfs_unflushed_log_block_min=1000
        zfs_unflushed_log_block_pct=400
        zfs_unflushed_log_txg_max=1000
        zfs_unflushed_max_mem_amt=1073741824
        zfs_unflushed_max_mem_ppm=1000
        zfs_unlink_suspend_progress=0
        zfs_user_indirect_is_special=1
        zfs_vdev_aggregation_limit=1048576
        zfs_vdev_aggregation_limit_non_rotating=131072
        zfs_vdev_async_read_max_active=3
        zfs_vdev_async_read_min_active=1
        zfs_vdev_async_write_active_max_dirty_percent=60
        zfs_vdev_async_write_active_min_dirty_percent=30
        zfs_vdev_async_write_max_active=10
        zfs_vdev_async_write_min_active=2
        zfs_vdev_def_queue_depth=32
        zfs_vdev_default_ms_count=200
        zfs_vdev_default_ms_shift=29
        zfs_vdev_direct_write_verify=1
        zfs_vdev_disk_classic=0
        zfs_vdev_disk_max_segs=0
        zfs_vdev_failfast_mask=1
        zfs_vdev_initializing_max_active=1
        zfs_vdev_initializing_min_active=1
        zfs_vdev_max_active=1000
        zfs_vdev_max_auto_ashift=14
        zfs_vdev_max_ms_shift=34
        zfs_vdev_min_auto_ashift=9
        zfs_vdev_min_ms_count=16
        zfs_vdev_mirror_non_rotating_inc=0
        zfs_vdev_mirror_non_rotating_seek_inc=1
        zfs_vdev_mirror_rotating_inc=0
        zfs_vdev_mirror_rotating_seek_inc=5
        zfs_vdev_mirror_rotating_seek_offset=1048576
        zfs_vdev_ms_count_limit=131072
        zfs_vdev_nia_credit=5
        zfs_vdev_nia_delay=5
        zfs_vdev_open_timeout_ms=1000
        zfs_vdev_queue_depth_pct=1000
        zfs_vdev_raidz_impl=cycle [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit=32768
        zfs_vdev_rebuild_max_active=3
        zfs_vdev_rebuild_min_active=1
        zfs_vdev_removal_max_active=2
        zfs_vdev_removal_min_active=1
        zfs_vdev_scheduler=unused
        zfs_vdev_scrub_max_active=3
        zfs_vdev_scrub_min_active=1
        zfs_vdev_sync_read_max_active=10
        zfs_vdev_sync_read_min_active=10
        zfs_vdev_sync_write_max_active=10
        zfs_vdev_sync_write_min_active=10
        zfs_vdev_trim_max_active=2
        zfs_vdev_trim_min_active=1
        zfs_vdev_write_gap_limit=4096
        zfs_vnops_read_chunk_size=33554432
        zfs_wrlog_data_max=8589934592
        zfs_xattr_compat=0
        zfs_zevent_len_max=512
        zfs_zevent_retain_expire_secs=900
        zfs_zevent_retain_max=2000
        zfs_zil_clean_taskq_maxalloc=1048576
        zfs_zil_clean_taskq_minalloc=1024
        zfs_zil_clean_taskq_nthr_pct=100
        zfs_zil_saxattr=1
        zil_maxblocksize=131072
        zil_maxcopied=7680
        zil_nocacheflush=0
        zil_replay_disable=0
        zil_slog_bulk=67108864
        zio_deadman_log_all=0
        zio_dva_throttle_enabled=1
        zio_requeue_io_start_cut_in_line=1
        zio_slow_io_ms=30000
        zio_taskq_batch_pct=80
        zio_taskq_batch_tpq=0
        zio_taskq_read=fixed,1,8 null scale null
        zio_taskq_write=sync null scale null
        zio_taskq_write_tpq=16
        zstd_abort_size=131072
        zstd_earlyabort_pass=1
        zvol_blk_mq_blocks_per_thread=8
        zvol_blk_mq_queue_depth=128
        zvol_enforce_quotas=1
        zvol_inhibit_dev=0
        zvol_major=230
        zvol_max_discard_blocks=16384
        zvol_num_taskqs=0
        zvol_open_timeout_ms=1000
        zvol_prefetch_bytes=131072
        zvol_request_sync=0
        zvol_threads=0
        zvol_use_blk_mq=0
        zvol_volmode=1

ZIL committed transactions:                                            0
        Commit requests:                                               0
        Flushes to stable storage:                                     0
        Transactions to SLOG storage pool:            0 Bytes          0
        Transactions to non-SLOG storage pool:        0 Bytes          0

JCBird1012 avatar Oct 30 '25 22:10 JCBird1012

Small update - after setting primarycache=metadata - the problem seems to have not occurred again - my working theory is that, for some reason, my particular workloads load the ARC with a lot of unevictable file data and that causes it to balloon, crashing my system. I’m not sure why all the data is considered unevictable, but I remember running an arc_summary a while ago right before a crash and seeing a large amount of it. ZFS does a good job at managing my particular ARC with just metadata.

JCBird1012 avatar Nov 01 '25 13:11 JCBird1012