zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Possible circular locking dependency in dmu_buf_will_dirty_impl vs free_children

Open snajpa opened this issue 5 years ago • 3 comments

System information

Type Version/Name
Distribution Name NixOS
Distribution Version nixpkgs/master
Linux Kernel 5.7.9
Architecture x86_64
ZFS Version master@https://github.com/zfsonlinux/zfs/commit/23c871671
SPL Version

ZFS and kernel have been compiled with lockdep and kernel patched to consider CDDL legit; on regular VFS usage, there is a report of possible circular lock dependency worth looking into.

[   43.641140] ======================================================
[   43.641825] WARNING: possible circular locking dependency detected
[   43.642486] 5.7.9 #1-NixOS Tainted: G           OE    
[   43.643038] ------------------------------------------------------
[   43.643700] dp_sync_taskq/828 is trying to acquire lock:
[   43.644266] ffff98731b49a610 (&db->db_mtx){+.+.}-{3:3}, at: dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.645322] 
[   43.645322] but task is already holding lock:
[   43.645947] ffff987324d6c438 (&db->db_rwlock){++++}-{3:3}, at: free_children+0x473/0x660 [zfs]
[   43.646905] 
[   43.646905] which lock already depends on the new lock.
[   43.646905] 
[   43.647799] 
[   43.647799] the existing dependency chain (in reverse order) is:
[   43.648624] 
[   43.648624] -> #1 (&db->db_rwlock){++++}-{3:3}:
[   43.649273]        down_read+0x33/0x1f0
[   43.649752]        dmu_buf_lock_parent+0x47/0x110 [zfs]
[   43.650344]        dbuf_read+0xee/0x720 [zfs]
[   43.650872]        dnode_hold_impl+0x189/0xeb0 [zfs]
[   43.651481]        dmu_buf_hold_noread+0x38/0x190 [zfs]
[   43.652096]        dmu_buf_hold+0x3c/0x90 [zfs]
[   43.652632]        zap_lockdir+0x4e/0xb0 [zfs]
[   43.653165]        zap_lookup_norm+0x59/0xd0 [zfs]
[   43.653741]        zap_lookup+0x12/0x20 [zfs]
[   43.654259]        spa_dir_prop+0x37/0x80 [zfs]
[   43.654801]        spa_ld_trusted_config+0x4a/0x740 [zfs]
[   43.655438]        spa_ld_mos_with_trusted_config+0x2d/0x80 [zfs]
[   43.656155]        spa_load+0x82/0x13c0 [zfs]
[   43.656693]        spa_tryimport+0x145/0x5e0 [zfs]
[   43.657287]        zfs_ioc_pool_tryimport+0x63/0xc0 [zfs]
[   43.657923]        zfsdev_ioctl_common+0x6a0/0x830 [zfs]
[   43.658538]        zfsdev_ioctl+0x53/0xe0 [zfs]
[   43.659038]        ksys_ioctl+0x87/0xc0
[   43.659452]        __x64_sys_ioctl+0x16/0x20
[   43.659923]        do_syscall_64+0x56/0x200
[   43.660388]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   43.660992] 
[   43.660992] -> #0 (&db->db_mtx){+.+.}-{3:3}:
[   43.661628]        __lock_acquire+0x12c3/0x24f0
[   43.662124]        lock_acquire+0xad/0x360
[   43.662569]        __mutex_lock+0x8b/0x920
[   43.663049]        dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.663701]        dsl_dataset_block_kill+0xce/0x780 [zfs]
[   43.664329]        free_blocks+0xeb/0x1d0 [zfs]
[   43.664870]        free_children+0x4a1/0x660 [zfs]
[   43.665425]        dnode_sync_free_range+0x1da/0x420 [zfs]
[   43.666066]        range_tree_vacate+0x159/0x260 [zfs]
[   43.666702]        dnode_sync+0x398/0x1280 [zfs]
[   43.667240]        sync_dnodes_task+0x5d/0x1e0 [zfs]
[   43.667791]        taskq_thread+0x2f8/0x620 [spl]
[   43.668297]        kthread+0xfe/0x140
[   43.668702]        ret_from_fork+0x3a/0x50
[   43.669145] 
[   43.669145] other info that might help us debug this:
[   43.669145] 
[   43.670004]  Possible unsafe locking scenario:
[   43.670004] 
[   43.670645]        CPU0                    CPU1
[   43.671135]        ----                    ----
[   43.671622]   lock(&db->db_rwlock);
[   43.672013]                                lock(&db->db_mtx);
[   43.672651]                                lock(&db->db_rwlock);
[   43.673300]   lock(&db->db_mtx);
[   43.673654] 
[   43.673654]  *** DEADLOCK ***
[   43.673654] 
[   43.674294] 1 lock held by dp_sync_taskq/828:
[   43.674772]  #0: ffff987324d6c438 (&db->db_rwlock){++++}-{3:3}, at: free_children+0x473/0x660 [zfs]
[   43.675788] 
[   43.675788] stack backtrace:
[   43.676259] CPU: 0 PID: 828 Comm: dp_sync_taskq Tainted: G           OE     5.7.9 #1-NixOS
[   43.677160] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   43.678392] Call Trace:
[   43.678671]  dump_stack+0x8f/0xd0
[   43.679034]  check_noncircular+0x169/0x180
[   43.679503]  __lock_acquire+0x12c3/0x24f0
[   43.679946]  lock_acquire+0xad/0x360
[   43.680367]  ? dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.680953]  __mutex_lock+0x8b/0x920
[   43.681375]  ? dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.681994]  ? dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.682643]  ? __lock_acquire+0x3f2/0x24f0
[   43.683127]  ? dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.683747]  ? bp_get_dsize_sync+0xb6/0x110 [zfs]
[   43.684285]  dmu_buf_will_dirty_impl+0x45/0x2b0 [zfs]
[   43.684882]  dsl_dataset_block_kill+0xce/0x780 [zfs]
[   43.685473]  free_blocks+0xeb/0x1d0 [zfs]
[   43.685954]  free_children+0x4a1/0x660 [zfs]
[   43.686458]  dnode_sync_free_range+0x1da/0x420 [zfs]
[   43.687040]  ? free_children+0x660/0x660 [zfs]
[   43.687565]  range_tree_vacate+0x159/0x260 [zfs]
[   43.688108]  dnode_sync+0x398/0x1280 [zfs]
[   43.688559]  ? kvm_sched_clock_read+0xd/0x20
[   43.689031]  ? sched_clock+0x5/0x10
[   43.689420]  ? sched_clock_cpu+0xc/0xa0
[   43.689877]  sync_dnodes_task+0x5d/0x1e0 [zfs]
[   43.690372]  taskq_thread+0x2f8/0x620 [spl]
[   43.690834]  ? wake_up_q+0xa0/0xa0
[   43.691211]  kthread+0xfe/0x140
[   43.691554]  ? taskq_thread_spawn+0x60/0x60 [spl]
[   43.692067]  ? kthread_park+0x90/0x90
[   43.692491]  ret_from_fork+0x3a/0x50

snajpa avatar Jul 16 '20 20:07 snajpa

I hit something similar on FreeBSD 12.0:

Fatal trap 12: page fault while in kernel mode
cpuid = 18; apic id = 24
fault virtual address   = 0x12
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81ff9470
stack pointer           = 0x28:0xfffffe2024495790
frame pointer           = 0x28:0xfffffe20244957b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (dp_sync_taskq_23)
trap number             = 12
panic: page fault
cpuid = 18
time = 1623410852
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe2024495390
vpanic() at vpanic+0x194/frame 0xfffffe20244953f0
panic() at panic+0x43/frame 0xfffffe2024495450
trap_fatal() at trap_fatal+0x357/frame 0xfffffe20244954a0
trap_pfault() at trap_pfault+0x49/frame 0xfffffe2024495500
trap() at trap+0x2ab/frame 0xfffffe20244956c0
calltrap() at calltrap+0x8/frame 0xfffffe20244956c0
--- trap 0xc, rip = 0xffffffff81ff9470, rsp = 0xfffffe2024495790, rbp = 0xfffffe20244957b0 ---
dmu_buf_will_dirty() at dmu_buf_will_dirty+0x40/frame 0xfffffe20244957b0
zap_lockdir_impl() at zap_lockdir_impl+0x4b1/frame 0xfffffe2024495850
zap_update() at zap_update+0x69/frame 0xfffffe20244958b0
zap_increment_int() at zap_increment_int+0xa2/frame 0xfffffe2024495920
userquota_updates_task() at userquota_updates_task+0x350/frame 0xfffffe20244959c0
taskq_run() at taskq_run+0x10/frame 0xfffffe20244959e0
taskqueue_run_locked() at taskqueue_run_locked+0x147/frame 0xfffffe2024495a40
taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe2024495a70
fork_exit() at fork_exit+0x86/frame 0xfffffe2024495ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2024495ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 28d9h51m20s
Dumping 10986 out of 130924 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at ./machine/pcpu.h:232
232     ./machine/pcpu.h: No such file or directory.
(kgdb) #0  __curthread () at ./machine/pcpu.h:232
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:318
#2  0xffffffff80aa31c3 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:386
#3  0xffffffff80aa36ae in vpanic (fmt=<optimized out>, ap=0xfffffe2024495430)
    at /usr/src/sys/kern/kern_shutdown.c:779
#4  0xffffffff80aa34d3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:710
#5  0xffffffff80f30507 in trap_fatal (frame=0xfffffe20244956d0, eva=18)
    at /usr/src/sys/amd64/amd64/trap.c:799
#6  0xffffffff80f30569 in trap_pfault (frame=0xfffffe20244956d0, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:653
#7  0xffffffff80f2fdbb in trap (frame=0xfffffe20244956d0)
    at /usr/src/sys/amd64/amd64/trap.c:420
#8  <signal handler called>
#9  0xffffffff81ff9470 in dmu_buf_will_dirty (db_fake=0xfffff814a26311d0, 
    tx=0xfffff80905c28e00)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1845
#10 0xffffffff82068121 in zap_lockdir_impl (db=<optimized out>, 
    tag=0xffffffff82110bc6, tx=0xfffff80905c28e00, lti=(unknown: 2), 
    fatreader=1, adding=1, zapp=0xfffffe2024495880)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:552
#11 0xffffffff82069bf9 in zap_lockdir (os=<optimized out>, 
    obj=<optimized out>, tx=0xfffff80905c28e00, lti=RW_WRITER, fatreader=1, 
    adding=1, tag=<optimized out>, zapp=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:607
#12 zap_update (os=<optimized out>, zapobj=<optimized out>, 
    name=0xfffffe20244958d0 "0", integer_size=8, num_integers=1, 
    val=0xfffffe20244958c8, tx=0xfffff80905c28e00)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1241
#13 0xffffffff82064652 in zap_increment (os=0xfffff801e21fac00, 
    obj=18446744073709551614, 
    name=0x30 <error: Cannot access memory at address 0x30>, delta=4096, 
    tx=0xfffff80905c28e00)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:1165
#14 zap_increment_int (os=0xfffff801e21fac00, obj=18446744073709551614, 
    key=<optimized out>, delta=4096, tx=0xfffff80905c28e00)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:1176
#15 0xffffffff8200d6d0 in do_userquota_cacheflush (os=<optimized out>, 
    cache=0x0, tx=0xfffff80905c28e00)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1354
#16 userquota_updates_task (arg=0xfffff803778af9c0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1455
#17 0xffffffff81fd3750 in taskq_run (arg=0xfffff801ead48270, pending=57682867)
    at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c:109
#18 0xffffffff80afc0e7 in taskqueue_run_locked (queue=0xfffff801c257dc00)
    at /usr/src/sys/kern/subr_taskqueue.c:463
#19 0xffffffff80afd2c8 in taskqueue_thread_loop (arg=<optimized out>)
    at /usr/src/sys/kern/subr_taskqueue.c:755
#20 0xffffffff80a65796 in fork_exit (
    callout=0xffffffff80afd210 <taskqueue_thread_loop>, 
    arg=0xfffff801b51a0d50, frame=0xfffffe2024495ac0)

bu7cher avatar Jun 11 '21 12:06 bu7cher

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 09:06 stale[bot]

I also see this in my testing with Lustre

jasimmons1973 avatar Oct 18 '23 20:10 jasimmons1973