zfs
zfs copied to clipboard
panic: VERIFY3(rc->rc_count == number) failed (FreeBSD 15-CURRENT, ZFS 2.1.14)
System information
Type | Version/Name |
---|---|
Distribution Name | FreeBSD |
Distribution Version | 15-CURRENT |
Kernel Version | 15 |
Architecture | arm64, amd64 |
OpenZFS Version | 2.1.14 |
Describe the problem you're observing
Sometimes the system panics during shutdown with an assertion failure. Four examples from different systems:
panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)
panic: VERIFY3(rc->rc_count == number) failed (59392 == 0)
panic: VERIFY3(rc->rc_count == number) failed (232652800 == 0)
panic: VERIFY3(rc->rc_count == number) failed (872153088 == 0)
Describe how to reproduce the problem
Unload ZFS and get unlucky. It doesn't always happen, but it happens to more people than just me. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276341
Include any warning/errors/backtraces from the system logs
panic: VERIFY3(rc->rc_count == number) failed (262144 == 0)
cpuid = 0
time = 1705275564
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
spl_panic() at spl_panic+0x44
zfs_refcount_destroy_many() at zfs_refcount_destroy_many+0xdc
arc_fini() at arc_fini+0x274
dmu_fini() at dmu_fini+0xc
spa_fini() at spa_fini+0x34
zfs_kmod_fini() at zfs_kmod_fini+0x78
zfs_shutdown() at zfs_shutdown+0x40
kern_reboot() at kern_reboot+0x574
sys_reboot() at sys_reboot+0x350
do_el0_sync() at do_el0_sync+0x58c
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000
[...]
#9 0xffff00000003daf8 in spl_panic (file=<optimized out>,
func=<optimized out>, line=13227925,
fmt=0x12 <error: Cannot access memory at address 0x12>)
at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:100
ap = {__stack = 0xffff0000f39d5630, __gr_top = 0xffff0000f39d55e0,
__vr_top = 0xffffa000010a92a0, __gr_offs = -32, __vr_offs = 0}
#10 0xffff0000001633e0 in zfs_refcount_destroy_many (rc=<optimized out>,
number=18446462598740100184)
at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:98
_verify3_right = 18446462598740100184
_verify3_left = 115
cookie = 0x0
ref = <optimized out>
#11 0xffff0000001633fc in zfs_refcount_destroy (rc=0x12,
rc@entry=0xffff0000010e7a80 <ARC_anon+128>)
at /usr/src/sys/contrib/openzfs/module/zfs/refcount.c:112
No locals.
#12 0xffff0000000c0508 in arc_state_fini ()
at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7421
No locals.
#13 arc_fini () at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:7753
p = <optimized out>
#14 0xffff0000000ee4d0 in dmu_fini ()
at /usr/src/sys/contrib/openzfs/module/zfs/dmu.c:2549
No locals.
#15 0xffff000000189dd8 in spa_fini ()
at /usr/src/sys/contrib/openzfs/module/zfs/spa_misc.c:2549
No locals.
#16 0xffff0000001e67ac in zfs_kmod_fini ()
at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7933
zsnext = 0x0
zs = 0x0
#17 0xffff00000006e654 in zfs__fini ()
at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:284
No locals.
#18 zfs_shutdown (arg=<optimized out>, howto=<optimized out>)
at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:297
No locals.
#19 0xffff000000693158 in kern_reboot (howto=0)
at /usr/src/sys/kern/kern_shutdown.c:517
_ep = 0xffffa00002f63c40
_t = 0xffffa00002f63c40
_el = 0xffffa00001709c80
once = 1
#20 0xffff000000692b74 in sys_reboot (td=0xffff000121c2a640,
uap=0xffff000121c2aa40) at /usr/src/sys/kern/kern_shutdown.c:312
error = <optimized out>
#21 0xffff000000a60060 in syscallenter (td=0xffff000121c2a640)
at /usr/src/sys/arm64/arm64/../../kern/subr_syscall.c:186
se = 0xffff000000fa1378 <sysent+1760>
sa = 0xffff000121c2aa30
p = <optimized out>
error = <optimized out>
sy_thr_static = true
traced = <optimized out>
_audit_entered = <optimized out>
The call that failed is
zfs_refcount_destroy(&arc_anon->arcs_size[ARC_BUFC_DATA]);
I printed out the object being destroyed and nothing seemed wrong with it except the count that should have been zero was not zero.
(kgdb) p ARC_anon.arcs_size[0]
$3 = {rc_count = 262144, rc_mtx = {lock_object = {lo_name = 0xffff000000c8d8f6 "rc->rc_mtx",
lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0x0,
avl_compar = 0xffff0000001630f0 <zfs_refcount_compare>, avl_offset = 0, avl_numnodes = 0},
rc_removed = {list_size = 48, list_offset = 0, list_head = {
list_next = 0xffff0000010e6ad8 <ARC_anon+216>, list_prev = 0xffff0000010e6ad8 <ARC_anon+216>}},
rc_removed_count = 0, rc_tracked = 0}
Note that sx_lock=1
is the correct value for a lock of this type when it is destroyed.
The links in the list point to the list object as I expect for an empty list.
On a different FreeBSD system running 2.2.3:
# kldunload zfs
Freed UMA keg (arc_buf_hdr_t_full) was not empty (2 items). Lost 2 pages of memory.
Freed UMA keg (arc_buf_t) was not empty (2 items). Lost 2 pages of memory.
Freed UMA keg (zio_buf_comb_131072) was not empty (2 items). Lost 64 pages of memory.
This may be the same bug. ZFS did not crash because assertions were not enabled. The OS noticed memory had not been freed. (The message came from the kernel, not kldunload. I was typing the command on the system console.)
Old ticket, but came across something similar at https://github.com/openzfsonwindows/openzfs/issues/364#issuecomment-2101862948
I wonder if failmode=continue
is in use here as well, and if there was a zio_wait restarting hung I/O for pool
in this issue.
I set vfs.zfs.reference_tracking_enable=1
and reproduced the problem. While shutting down, I hit a panic in the dbuf code which may or may not be related:
panic: No such hold 0xffffa7ffccf11000 on refcount ffff0000017a7d80
...
#7 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/home/markj/freebsd/sys/kern/subr_kdb.c:556
#8 0xffff0000004a8d50 in vpanic (fmt=<optimized out>, ap=...) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:967
#9 0xffff0000004a8b64 in panic (fmt=0x12 <error: Cannot access memory at address 0x12>) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:892
#10 0xffff00000163abb8 in zfs_refcount_remove_many (rc=<optimized out>, number=4800, holder=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/refcount.c:187
#11 0xffff0000015ac24c in dbuf_evict_user (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:580
#12 0xffff0000015a2338 in dbuf_clear_data (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:1232
#13 0xffff0000015a4844 in dbuf_destroy (db=db@entry=0xffffa7ffccf11000) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dbuf.c:3040
#14 0xffff0000015e2dbc in dnode_evict_dbufs (dn=0xffffa7ffcdd5c780) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dnode_sync.c:509
#15 0xffff0000015c03d4 in dmu_objset_evict_dbufs (os=<optimized out>) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/zfs/dmu_objset.c:981
#16 0xffff000001550c2c in zfsvfs_teardown (zfsvfs=zfsvfs@entry=0xffffa7ffccf1c000, unmounting=<optimized out>, unmounting@entry=1) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1656
#17 0xffff00000154f24c in zfs_umount (vfsp=0xffff00050f76b280, fflag=524288) at /usr/home/markj/freebsd/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1710
#18 0xffff000000595bd4 in dounmount (mp=0xffff00050f76b280, flags=<optimized out>, flags@entry=524288, td=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_mount.c:2340
#19 0xffff0000005a2520 in unmount_or_warn (mp=0xffff00050f76b280) at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:4973
#20 vfs_unmountall () at /usr/home/markj/freebsd/sys/kern/vfs_subr.c:5007
#21 0xffff00000057343c in bufshutdown (show_busybufs=<optimized out>) at /usr/home/markj/freebsd/sys/kern/vfs_bio.c:1509
#22 0xffff0000004a853c in kern_reboot (howto=0) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:508
#23 0xffff0000004a8228 in sys_reboot (td=0xffff000202b48640, uap=0xffff000202b48a40) at /usr/home/markj/freebsd/sys/kern/kern_shutdown.c:313
#24 0xffff000000864658 in syscallenter (td=0xffff000202b48640) at /usr/home/markj/freebsd/sys/arm64/arm64/../../kern/subr_syscall.c:189
#25 svc_handler (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:198
#26 do_el0_sync (td=0xffff000202b48640, frame=<optimized out>) at /usr/home/markj/freebsd/sys/arm64/arm64/trap.c:648
#27 <signal handler called>
This is happening in a block of code in dbuf_evict_user()
which was added recently, in commit 92dc4ad83d95b. I can't comment intelligently on this code, but maybe @robn can see what's going on here? I'm happy to provide more debugging info if needed.
Looking at the remaining references in ARC_anon.arcs_size (it's always anonymous buffers that are leaked), I can see it's two 128KB buffers. (Unsurprisingly, I could see 256KB of "Anon" ARC memory lingering on the idle system in top
before I shut down.) The refcount tag is an arc_buf_t
, both reference an ARC buffer header which looks like this:
(kgdb) p *$23->b_hdr
$27 = {b_dva = {dva_word = {0, 0}}, b_birth = 0, b_type = ARC_BUFC_DATA, b_complevel = 0 '\000', b_reserved1 = 0 '\000', b_reserved2 = 0, b_hash_next = 0x0, b_flags = (ARC_FLAG_HAS_L1HDR | ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_1), b_psize = 256, b_lsize = 256, b_spa = 2989832975500785283, b_l2hdr = {
b_dev = 0x0, b_daddr = 0, b_hits = 0, b_arcs_state = ARC_STATE_ANON, b_l2node = {list_next = 0x0, list_prev = 0x0}}, b_l1hdr = {b_state = 0xffff000001795900 <ARC_anon>, b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0,
b_byteswap = 10 '\n', b_buf = 0xffffa815ebe62840, b_refcnt = {rc_count = 1, rc_mtx = {lock_object = {lo_name = 0xffff00000147e117 "rc->rc_mtx", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, rc_tree = {avl_root = 0xffffa82c9c51d0c0, avl_compar = 0xffff00000163a5b8 <zfs_refcount_compare>,
avl_offset = 0, avl_numnodes = 1}, rc_removed = {list_offset = 0, list_head = {list_next = 0xffffa82c017f86a8, list_prev = 0xffffa82c017f86a8}}, rc_removed_count = 0, rc_tracked = 1}, b_acb = 0x0, b_pabd = 0x0, b_freeze_cksum = 0x0, b_freeze_lock = {lock_object = {
lo_name = 0xffff00000143d604 "hdr->b_l1hdr.b_freeze_lock", lo_flags = 577830912, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}}, b_crypt_hdr = {b_rabd = 0x0, b_dsobj = 0, b_ot = DMU_OT_NONE, b_salt = "\000\000\000\000\000\000\000", b_iv = '\000' <repeats 11 times>, b_mac = '\000' <repeats 15 times>}}
I'm out of time to look at this for the moment, but perhaps this is enough for someone smarter than me to spot the problem?
@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.
Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.
@markjdb thanks for this. Nothing jumping out at me, but when I did a code read a couple of days ago I didn't go through that codepath, so it gives me something else to go on. I'll start looking later today.
Is there a semi-reliable repro here? From the descriptions it looks like the inducing a bunch of ARC churn and then unloading the module is probably enough.
I messaged you out-of-band with some info. If anyone else has steps to reproduce this reliably, please feel free to add them.
Right, I think I understand the dbuf refcount bug, and I'll send a patch through for that soon (tonight I hope). I don't think it's related to the original issue; its a straight refcounting bug (actually, we're not even miscounting, just holding the refcount API a little bit wrong). It might be related at a distance, but I think not, because the memory leak reported is real, and is from the ARC, not the dbuf cache.
#16191 should fix the dbuf refcounting thing. I doubt it'll do much for the original issue, but should at least let reference_tracking_enable=1
get a read on it.
I tried modifying the refcount code to save a stack trace in the tracker structure at the point where the reference is acquired. If I dump the stacks, I see:
(kgdb) set $_i = 0
(kgdb) while ($_i < 10)
>x/x $7.ref_st.pcs[$_i]
>set $_i = $_i + 1
>end
0xffff000001639a30 <zfs_refcount_add_many+152>: 0x42
0xffff00000158f76c <arc_get_data_impl+464>: 0x8b
0xffff00000157c258 <arc_buf_alloc_impl+980>: 0xd1
0xffff00000157adf0 <arc_alloc_buf+108>: 0x25
0xffff0000015a6954 <dbuf_noread+188>: 0x0c
0xffff0000015a6db8 <dmu_buf_will_fill+284>: 0xb8
0xffff0000015b9040 <dmu_write_uio_dnode+452>: 0x17
0xffff0000015b91e4 <dmu_write_uio_dbuf+84>: 0x26
0xffff0000016d744c <zfs_write+1328>: 0x51
0xffff000001558ce4 <zfs_freebsd_write+60>: 0x8e
Both 128KB buffers have the same stack.
Reading the relevant code, I see that commit 9b1677fb5a082 introduced some new cases, and it was committed around the time that this bug was first noticed. Maybe @amotin has some idea?
I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.
I don't have any quick ideas, but at some point Brian noticed FreeBSD CI failures with supposedly some kernel panic after tests completion, possibly on module unload. With some hope it may be this assertion. If we could reproduce it on proper system with a console and find what it is and what test triggers it -- it could help.
I have access to such a system. To reproduce it, I run make -j80 buildworld buildkernel
. It consistently leaks 2x128KB arc_anon buffers each time, somewhere late in buildworld. If there is some other useful info I can collect, please let me know what to try.
I hoped for something more specific/synthetic, like a CI test. I'll think about it.
@amotin Suppose we're in dmu_write_uio_dnode() -> dmu_buf_will_fill() -> dbuf_noread()
, and the dbuf's initial state is UNCACHED. Then we allocate a buffer and change the state to FILL. Suppose the fill fails (e.g., because copying from userspace triggered a page fault and FreeBSD's VFS needs to return EFAULT to avoid a deadlock), so dmu_write_uio_dnode()
calls dmu_buf_fill_done()
to roll back, and the dbuf state is rolled back to UNCACHED (the else if (failed)
case). How does the aforementioned ARC buffer get freed?
I hoped for something more specific/synthetic, like a CI test. I'll think about it.
Here's a reproducer program. Give it a path to a non-existent file on a ZFS filesystem. If run in a loop, I see the number of anon ARC pages grow quickly. Something like while true; do ./test /tmp/foo; done
.
#include <sys/mman.h>
#include <err.h>
#include <fcntl.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
void *addr;
ssize_t n;
long ps;
int fd;
ps = 128 * 1024; /* has to be the record size? */
(void)unlink(argv[1]);
fd = open(argv[1], O_RDWR | O_CREAT, 0644);
if (fd < 0)
err(1, "open");
if (ftruncate(fd, ps) < 0)
err(1, "ftruncate");
addr = mmap(NULL, ps, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
if (addr == MAP_FAILED)
err(1, "mmap");
/*
* Writing a freshly mapped anon page triggers a fault since the kernel
* hasn't added it to the page tables yet.
*/
n = write(fd, addr, ps);
if (n < 0)
err(1, "write");
}
@markjdb Thank you for the reproduction, it works. Addition of arc_buf_destroy(db->db_buf, db);
into dmu_buf_fill_done()
seems to fix the assertion, but I'll look deeper tomorrow to be sure.
@behlendorf Worth pointing out the PR did fix the leak problem, but does not fix the title of the ticket. panic: VERIFY3(rc->rc_count == number) failed
can still be triggered reliably.
Thanks I overlooked that. Reopening.
but does not fix the title of the ticket.
panic: VERIFY3(rc->rc_count == number) failed
can still be triggered reliably.
@lundman Are you talking about failmode=continue, or you have some other reproduction? If that is it, then I would call it a different issue.
For me, I can trigger the assert panic: VERIFY3(rc->rc_count == number) failed
more easily if I set failmode=continue
and lower timeout.
But it can be triggered with default ZFS timeout and zstd19. My guess is that IO just takes "too long" - for whatever reason, zstd19/failing-device etc, and the code that is supposed to restart IO does so incorrectly. Just a guess.
This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had failmode=continue
when they created this ticket.
The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.
This assertion IS about memory leaks. The fact that you see it in othet scenarios just means there are other leaks, probably unrelated to the one I fixed.
This ticket does say FreeBSD, but it is the same alert - I do not know if the FreeBSD bug had
failmode=continue
when they created this ticket.
Probably not, given the number of people upstream who have reported the problem.
The memory leak issue was guessed to be related to this ticket, but probably should have been a separate ticket.
It wasn't guessed. I hit the problem quite a few times and finally debugged it. Based on the timing of the upstream bug report and the commit which introduced the leak (within a couple of weeks of each other), the fact that the folks reporting the problem track the FreeBSD development branch (where the bug would have first appeared), and the fact that this apparently wasn't observed on Linux (which avoids the code path in question because it prefaults data to be written), I'm quite sure that the original report is about the problem that is now fixed.
OK that sounds good. Let's go with that and close it.
OK that sounds good. Let's go with that and close it.
An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :) Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?
OK that sounds good. Let's go with that and close it.
An issue originally reported for v2.1.14 got fixed in some version after 2.2.3? :) Better late than never but probably not very helpful for the OP and others with reasons to remain on 2.1 - unless the fix is simple enough to backport?
I don't believe 2.1.14 is actually affected, looking at the code. In particular, dmu_buf_fill_done()
is missing the third parameter, failed
. The bug came in with commit 9b1677fb5a.
On Monday July 22 2024 06:45:06 Mark Johnston wrote:
I don't believe 2.1.14 is actually affected, looking at the code. In particular,
dmu_buf_fill_done()
is missing the third parameter,failed
. The bug came in with commit 9b1677fb5a.
I did notice that the function in question is a lot simpler...