ndctl
ndctl copied to clipboard
concurrent invocations of ndctl can cause linux panic
Raising a discussion on the linux-nvdimm alias to be tracked as a github issue.
https://lists.01.org/pipermail/linux-nvdimm/2019-May/021385.html
The problem still exists in 5.2 RC2
The problem is fairly easy to reproduce in as little as 10 minutes. Do the following in parallel, like in separate terminals. Example... in term #1, #3, #5, type while [1]; do ndctl create-namespace -m devdax -s 48G done in term #2, #4, #6, type while [1]; do ndctl destroy-namespace all -f done
Even simple invocation will eventually lead to a panic, it can take hours though. Example...
in term #1 run the script
#/bin/bash
while /bin/true
do
ndctl destroy-namespace -f all
date
for R in ndctl list -R | jq -r ".[] | .dev"
do
for i in {1..10}
do
ndctl create-namespace -r $R -s 8g -m devdax
done
done
done
in term #2 type
while /bin/true; do ndctl list done
Run that same terminal #1 script in 2 separate terminals, thereby creating 2 separate threads that will destroy/create will usually result in a panic within an hour.
Update with 5.2 RC2 + patches like the one for issue 91 also exhibit the problem. Same stack as the one in the nvdimm alias.
[ 376.581650] CPU: 20 PID: 1950 Comm: kworker/u130:14 Not tainted 4.14.35-1923.el7uek.x86_64 #2 [ 376.591165] Hardware name: Oracle Corporation ORACLE SERVER X8-2/ASM, MB, X7-2, BIOS 51020101 05/07/2019 [ 376.601755] Workqueue: events_unbound async_run_entry_fn [ 376.607683] task: ffff9e78fa63bd80 task.stack: ffffc2348fb74000 [ 376.614292] RIP: 0010:kernfs_find_ns+0x18/0xbf [ 376.619250] RSP: 0018:ffffc2348fb77d20 EFLAGS: 00010246 [ 376.625081] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff [ 376.633045] RDX: 0000000000000000 RSI: ffffffffa8eb5ac1 RDI: 0000000000000000 [ 376.641010] RBP: ffffc2348fb77d40 R08: 0000000000000000 R09: ffff9e61f9f48000 [ 376.648973] R10: 000000000000005c R11: 00000000000000a6 R12: ffffffffa8eb5ac1 [ 376.656938] R13: 0000000000000000 R14: ffffffffa8eb5ac1 R15: ffff9e7905fad208 [ 376.664902] FS: 0000000000000000(0000) GS:ffff9e791ef00000(0000) knlGS:0000000000000000 [ 376.673933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 376.680347] CR2: 0000000000000070 CR3: 000000156b40a002 CR4: 00000000007606e0 [ 376.688311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 376.696273] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 376.704238] PKRU: 55555554 [ 376.707255] Call Trace: [ 376.709987] kernfs_find_and_get_ns+0x31/0x52 [ 376.714848] sysfs_unmerge_group+0x1d/0x57 [ 376.719422] dpm_sysfs_remove+0x22/0x5c [ 376.723706] device_del+0x5a/0x325 [ 376.727502] device_unregister+0x1a/0x58 [ 376.731886] nd_async_device_unregister+0x22/0x30 [libnvdimm] [ 376.738299] async_run_entry_fn+0x3e/0x169 [ 376.742870] process_one_work+0x169/0x3a6 [ 376.747345] worker_thread+0x4d/0x3e5 [ 376.751434] kthread+0x105/0x138 [ 376.755035] ? rescuer_thread+0x380/0x375 [ 376.759510] ? kthread_bind+0x20/0x15 [ 376.763600] ret_from_fork+0x24/0x49 [ 376.767588] Code: 24 08 48 83 42 40 01 5b 41 5c 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 49 89 f6 41 55 49 89 d5 31 d2 41 54 53 <0f> b7 47 70 48 8b 5f 48 66 c1 e8 05 83 e0 01 4d 85 ed 0f b6 c8 [ 376.788686] RIP: kernfs_find_ns+0x18/0xbf RSP: ffffc2348fb77d20 [ 376.795293] CR2: 0000000000000070
I'm able to readily reproduce this. Concurrent ndctl seems to be triggering double-free (double device-unregistration events). Still looking to narrow down all the scenarios where double unregistration occurs.
Proposed fixes here: https://lists.01.org/pipermail/linux-nvdimm/2019-June/021847.html
Also pushed out to libnvdimm-pending: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending
This should be fixed with recent Linux versions (such as 5.15)