zfs panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14)

System information

Type	Version/Name
Distribution Name	FreeBSD
Distribution Version	15-CURRENT
Kernel Version	15
Architecture	arm64, amd64
OpenZFS Version	2.1.14

Describe the problem you're observing

Sometimes the system panics during shutdown with an assertion failure. Four examples from different systems:

panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)
panic: VERIFY3(rc->rc_count == number) failed (59392 == 0)
panic: VERIFY3(rc->rc_count == number) failed (232652800 == 0)
panic: VERIFY3(rc->rc_count == number) failed (872153088 == 0)

Describe how to reproduce the problem

Unload ZFS and get unlucky. It doesn't always happen, but it happens to more people than just me. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276341

Include any warning/errors/backtraces from the system logs

panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)

cpuid = 0
time = 1705275564
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
spl_panic() at spl_panic+0x44
zfs_refcount_destroy_many() at zfs_refcount_destroy_many+0xdc
arc_fini() at arc_fini+0x274
dmu_fini() at dmu_fini+0xc
spa_fini() at spa_fini+0x34
zfs_kmod_fini() at zfs_kmod_fini+0x78
zfs_shutdown() at zfs_shutdown+0x40
kern_reboot() at kern_reboot+0x574
sys_reboot() at sys_reboot+0x350
do_el0_sync() at do_el0_sync+0x58c
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000

[...]

#9  0xffff00000003daf8 in spl_panic (file=<optimized out>, 
    func=<optimized out>, line=13227925, 
    fmt=0x12 <error: Cannot access memory at address 0x12>)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:100
        ap = {__stack = 0xffff0000f39d5630, __gr_top = 0xffff0000f39d55e0, 
          __vr_top = 0xffffa000010a92a0, __gr_offs = -32, __vr_offs = 0}
#10 0xffff0000001633e0 in zfs_refcount_destroy_many (rc=<optimized out>, 
    number=18446462598740100184)
    at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:98
        _verify3_right = 18446462598740100184
        _verify3_left = 115
        cookie = 0x0
        ref = <optimized out>
#11 0xffff0000001633fc in zfs_refcount_destroy (rc=0x12, 
    rc@entry=0xffff0000010e7a80 <ARC_anon+128>)
    at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:112
No locals.
#12 0xffff0000000c0508 in arc_state_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7421
No locals.
#13 arc_fini () at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7753
        p = <optimized out>
#14 0xffff0000000ee4d0 in dmu_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu.c:2549
No locals.
#15 0xffff000000189dd8 in spa_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/spa_misc.c:2549
No locals.
#16 0xffff0000001e67ac in zfs_kmod_fini ()
    at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7933
        zsnext = 0x0
        zs = 0x0
#17 0xffff00000006e654 in zfs__fini ()
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:284
No locals.
#18 zfs_shutdown (arg=<optimized out>, howto=<optimized out>)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:297
No locals.
#19 0xffff000000693158 in kern_reboot (howto=0)
    at /usr/src/sys/kern/kern_shutdown.c:517
        _ep = 0xffffa00002f63c40
        _t = 0xffffa00002f63c40
        _el = 0xffffa00001709c80
        once = 1
#20 0xffff000000692b74 in sys_reboot (td=0xffff000121c2a640, 
    uap=0xffff000121c2aa40) at /usr/src/sys/kern/kern_shutdown.c:312
        error = <optimized out>
#21 0xffff000000a60060 in syscallenter (td=0xffff000121c2a640)
    at /usr/src/sys/arm64/arm64/../../kern/subr_syscall.c:186
        se = 0xffff000000fa1378 <sysent+1760>
        sa = 0xffff000121c2aa30
        p = <optimized out>
        error = <optimized out>
        sy_thr_static = true
        traced = <optimized out>
        _audit_entered = <optimized out>

The call that failed is

zfs_refcount_destroy(&arc_anon->arcs_size[ARC_BUFC_DATA]);

I printed out the object being destroyed and nothing seemed wrong with it except the count that should have been zero was not zero.

(kgdb) p ARC_anon.arcs_size[0]
$3 = {rc_count = 262144, rc_mtx = {lock_object = {lo_name = 0xffff000000c8d8f6 "rc->rc_mtx", 
      lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0x0, 
    avl_compar = 0xffff0000001630f0 <zfs_refcount_compare>, avl_offset = 0, avl_numnodes = 0}, 
  rc_removed = {list_size = 48, list_offset = 0, list_head = {
      list_next = 0xffff0000010e6ad8 <ARC_anon+216>, list_prev = 0xffff0000010e6ad8 <ARC_anon+216>}}, 
  rc_removed_count = 0, rc_tracked = 0}

Note that sx_lock=1 is the correct value for a lock of this type when it is destroyed. The links in the list point to the list object as I expect for an empty list.

Jan 22 '24 01:01 VoxSciurorum

On a different FreeBSD system running 2.2.3:

# kldunload zfs
Freed UMA keg (arc_buf_hdr_t_full) was not empty (2 items).  Lost 2 pages of memory.
Freed UMA keg (arc_buf_t) was not empty (2 items).  Lost 2 pages of memory.
Freed UMA keg (zio_buf_comb_131072) was not empty (2 items).  Lost 64 pages of memory.

This may be the same bug. ZFS did not crash because assertions were not enabled. The OS noticed memory had not been freed. (The message came from the kernel, not kldunload. I was typing the command on the system console.)

Apr 23 '24 13:04 VoxSciurorum

Old ticket, but came across something similar at https://github.com/openzfsonwindows/openzfs/issues/364#issuecomment-2101862948

I wonder if failmode=continue is in use here as well, and if there was a zio_wait restarting hung I/O for pool in this issue.

May 09 '24 05:05 lundman

I set vfs.zfs.reference_tracking_enable=1 and reproduced the problem. While shutting down, I hit a panic in the dbuf code which may or may not be related:

panic: No such hold 0xffffa7ffccf11000 on refcount ffff0000017a7d80
...
#7  kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/home/markj/freebsd/sys/kern/subr_kdb.c:556
#8  0xffff0000004a8d50 in vpanic (fmt=<optimized out>, ap=...) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:967
#9  0xffff0000004a8b64 in panic (fmt=0x12 <error: Cannot access memory at address 0x12>) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:892
#10 0xffff00000163abb8 in zfs_refcount_remove_many (rc=<optimized out>, number=4800, holder=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/refcount.c:187
#11 0xffff0000015ac24c in dbuf_evict_user (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:580
#12 0xffff0000015a2338 in dbuf_clear_data (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:1232
#13 0xffff0000015a4844 in dbuf_destroy (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:3040
#14 0xffff0000015e2dbc in dnode_evict_dbufs (dn=0xffffa7ffcdd5c780) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dnode_sync.c:509
#15 0xffff0000015c03d4 in dmu_objset_evict_dbufs (os=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dmu_objset.c:981
#16 0xffff000001550c2c in zfsvfs_teardown (zfsvfs=zfsvfs@entry=0xffffa7ffccf1c000, unmounting=<optimized out>, unmounting@entry=1) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1656
#17 0xffff00000154f24c in zfs_umount (vfsp=0xffff00050f76b280, fflag=524288) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1710
#18 0xffff000000595bd4 in dounmount (mp=0xffff00050f76b280, flags=<optimized out>, flags@entry=524288, td=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_mount.c:2340
#19 0xffff0000005a2520 in unmount_or_warn (mp=0xffff00050f76b280) at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:4973
#20 vfs_unmountall () at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:5007
#21 0xffff00000057343c in bufshutdown (show_busybufs=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_bio.c:1509
#22 0xffff0000004a853c in kern_reboot (howto=0) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:508
#23 0xffff0000004a8228 in sys_reboot (td=0xffff000202b48640, uap=0xffff000202b48a40) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:313
#24 0xffff000000864658 in syscallenter (td=0xffff000202b48640) at /usr/home/markj/freebsd/sys/arm64/arm64/../../kern/subr_syscall.c:189
#25 svc_handler (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:198
#26 do_el0_sync (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:648
#27 <signal handler called>

This is happening in a block of code in dbuf_evict_user() which was added recently, in commit 92dc4ad83d95b. I can't comment intelligently on this code, but maybe @robn can see what's going on here? I'm happy to provide more debugging info if needed.

Looking at the remaining references in ARC_anon.arcs_size (it's always anonymous buffers that are leaked), I can see it's two 128KB buffers. (Unsurprisingly, I could see 256KB of "Anon" ARC memory lingering on the idle system in top before I shut down.) The refcount tag is an arc_buf_t, both reference an ARC buffer header which looks like this:

(kgdb) p *$23->b_hdr                                                                                                                                           
$27 = {b_dva = {dva_word = {0, 0}}, b_birth = 0, b_type = ARC_BUFC_DATA, b_complevel = 0 '\000', b_reserved1 = 0 '\000', b_reserved2 = 0, b_hash_next = 0x0, b_flags = (ARC_FLAG_HAS_L1HDR | ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_1), b_psize = 256, b_lsize = 256, b_spa = 2989832975500785283, b_l2hdr = {
    b_dev = 0x0, b_daddr = 0, b_hits = 0, b_arcs_state = ARC_STATE_ANON, b_l2node = {list_next = 0x0, list_prev = 0x0}}, b_l1hdr = {b_state = 0xffff000001795900 <ARC_anon>, b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0, 
    b_byteswap = 10 '\n', b_buf = 0xffffa815ebe62840, b_refcnt = {rc_count = 1, rc_mtx = {lock_object = {lo_name = 0xffff00000147e117 "rc->rc_mtx", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0xffffa82c9c51d0c0, avl_compar = 0xffff00000163a5b8 <zfs_refcount_compare>,     
        avl_offset = 0, avl_numnodes = 1}, rc_removed = {list_offset = 0, list_head = {list_next = 0xffffa82c017f86a8, list_prev = 0xffffa82c017f86a8}}, rc_removed_count = 0, rc_tracked = 1}, b_acb = 0x0, b_pabd = 0x0, b_freeze_cksum = 0x0, b_freeze_lock = {lock_object = {                                           
        lo_name = 0xffff00000143d604 "hdr->b_l1hdr.b_freeze_lock", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}}, b_crypt_hdr = {b_rabd = 0x0, b_dsobj = 0, b_ot = DMU_OT_NONE, b_salt = "\000\000\000\000\000\000\000", b_iv = '\000' <repeats 11 times>, b_mac = '\000' <repeats 15 times>}}

I'm out of time to look at this for the moment, but perhaps this is enough for someone smarter than me to spot the problem?

May 10 '24 19:05 markjdb

@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.

Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.

May 10 '24 23:05 robn

@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.

Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.

I messaged you out-of-band with some info. If anyone else has steps to reproduce this reliably, please feel free to add them.

May 11 '24 00:05 markjdb

Right, I think I understand the dbuf refcount bug, and I'll send a patch through for that soon (tonight I hope). I don't think it's related to the original issue; its a straight refcounting bug (actually, we're not even miscounting, just holding the refcount API a little bit wrong). It might be related at a distance, but I think not, because the memory leak reported is real, and is from the ARC, not the dbuf cache.

May 11 '24 07:05 robn

#16191 should fix the dbuf refcounting thing. I doubt it'll do much for the original issue, but should at least let reference_tracking_enable=1 get a read on it.

May 11 '24 09:05 robn

I tried modifying the refcount code to save a stack trace in the tracker structure at the point where the reference is acquired. If I dump the stacks, I see:

(kgdb) set $_i = 0
(kgdb) while ($_i < 10)
 >x/x $7.ref_st.pcs[$_i]
 >set $_i = $_i + 1
 >end
0xffff000001639a30 <zfs_refcount_add_many+152>: 0x42
0xffff00000158f76c <arc_get_data_impl+464>:     0x8b
0xffff00000157c258 <arc_buf_alloc_impl+980>:    0xd1
0xffff00000157adf0 <arc_alloc_buf+108>: 0x25
0xffff0000015a6954 <dbuf_noread+188>:   0x0c
0xffff0000015a6db8 <dmu_buf_will_fill+284>:     0xb8
0xffff0000015b9040 <dmu_write_uio_dnode+452>:   0x17
0xffff0000015b91e4 <dmu_write_uio_dbuf+84>:     0x26
0xffff0000016d744c <zfs_write+1328>:    0x51
0xffff000001558ce4 <zfs_freebsd_write+60>:      0x8e

Both 128KB buffers have the same stack.

Reading the relevant code, I see that commit 9b1677fb5a082 introduced some new cases, and it was committed around the time that this bug was first noticed. Maybe @amotin has some idea?

May 14 '24 21:05 markjdb

I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.

May 15 '24 00:05 amotin

I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.

I have access to such a system. To reproduce it, I run make -j80 buildworld buildkernel. It consistently leaks 2x128KB arc_anon buffers each time, somewhere late in buildworld. If there is some other useful info I can collect, please let me know what to try.

May 15 '24 13:05 markjdb

I hoped for something more specific/synthetic, like a CI test. I'll think about it.

May 15 '24 14:05 amotin

@amotin Suppose we're in dmu_write_uio_dnode() -> dmu_buf_will_fill() -> dbuf_noread(), and the dbuf's initial state is UNCACHED. Then we allocate a buffer and change the state to FILL. Suppose the fill fails (e.g., because copying from userspace triggered a page fault and FreeBSD's VFS needs to return EFAULT to avoid a deadlock), so dmu_write_uio_dnode() calls dmu_buf_fill_done() to roll back, and the dbuf state is rolled back to UNCACHED (the else if (failed) case). How does the aforementioned ARC buffer get freed?

May 20 '24 20:05 markjdb

I hoped for something more specific/synthetic, like a CI test. I'll think about it.

Here's a reproducer program. Give it a path to a non-existent file on a ZFS filesystem. If run in a loop, I see the number of anon ARC pages grow quickly. Something like while true; do ./test /tmp/foo; done.

#include <sys/mman.h>
#include <err.h>
#include <fcntl.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
        void *addr;
        ssize_t n;
        long ps;
        int fd;

        ps = 128 * 1024; /* has to be the record size? */

        (void)unlink(argv[1]);
        fd = open(argv[1], O_RDWR | O_CREAT, 0644);
        if (fd < 0)
                err(1, "open");
        if (ftruncate(fd, ps) < 0)
                err(1, "ftruncate");

        addr = mmap(NULL, ps, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
        if (addr == MAP_FAILED)
                err(1, "mmap");

        /*
         * Writing a freshly mapped anon page triggers a fault since the kernel
         * hasn't added it to the page tables yet.
         */
        n = write(fd, addr, ps);
        if (n < 0)
                err(1, "write");
}

May 20 '24 22:05 markjdb

@markjdb Thank you for the reproduction, it works. Addition of arc_buf_destroy(db->db_buf, db); into dmu_buf_fill_done() seems to fix the assertion, but I'll look deeper tomorrow to be sure.

May 21 '24 02:05 amotin

@behlendorf Worth pointing out the PR did fix the leak problem, but does not fix the title of the ticket. panic: VERIFY3(rc->rc_count == number) failed can still be triggered reliably.

May 25 '24 02:05 lundman

Thanks I overlooked that. Reopening.

May 25 '24 02:05 behlendorf

but does not fix the title of the ticket. panic: VERIFY3(rc->rc_count == number) failed can still be triggered reliably.

@lundman Are you talking about failmode=continue, or you have some other reproduction? If that is it, then I would call it a different issue.

May 25 '24 02:05 amotin

For me, I can trigger the assert panic: VERIFY3(rc->rc_count == number) failed more easily if I set failmode=continue and lower timeout.

But it can be triggered with default ZFS timeout and zstd19. My guess is that IO just takes "too long" - for whatever reason, zstd19/failing-device etc, and the code that is supposed to restart IO does so incorrectly. Just a guess.

This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had failmode=continue when they created this ticket.

The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.

May 25 '24 02:05 lundman

This assertion IS about memory leaks. The fact that you see it in othet scenarios just means there are other leaks, probably unrelated to the one I fixed.

May 25 '24 03:05 amotin

This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had failmode=continue when they created this ticket.

Probably not, given the number of people upstream who have reported the problem.

The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.

It wasn't guessed. I hit the problem quite a few times and finally debugged it. Based on the timing of the upstream bug report and the commit which introduced the leak (within a couple of weeks of each other), the fact that the folks reporting the problem track the FreeBSD development branch (where the bug would have first appeared), and the fact that this apparently wasn't observed on Linux (which avoids the code path in question because it prefaults data to be written), I'm quite sure that the original report is about the problem that is now fixed.

May 25 '24 03:05 markjdb

OK that sounds good. Let's go with that and close it.

May 25 '24 03:05 lundman

OK that sounds good. Let's go with that and close it.

An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :) Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?

Jul 22 '24 11:07 RJVB

OK that sounds good. Let's go with that and close it.

An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :) Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?

I don't believe 2.1.14 is actually affected, looking at the code. In particular, dmu_buf_fill_done() is missing the third parameter, failed. The bug came in with commit 9b1677fb5a.

Jul 22 '24 13:07 markjdb

On Monday July 22 2024 06:45:06 Mark Johnston wrote:

I don't believe 2.1.14 is actually affected, looking at the code. In particular, dmu_buf_fill_done() is missing the third parameter, failed. The bug came in with commit 9b1677fb5a.

I did notice that the function in question is a lot simpler...

Jul 22 '24 15:07 RJVB

zfs zfs copied to clipboard

panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14)

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

zfs
zfs copied to clipboard