ndctl create/destroy namespace yields kernel warning messages

Two processes running: (1) while true, ndctl create (2) while true and namespace found (created in 1); ndctl destroy all

[ 2433.123249] WARNING: CPU: 4 PID: 55873 at 
drivers/nvdimm/namespace_devs.c:916 size_store+0x7c2/0x910 [libnvdimm] 
[....] 
[ 2433.123282] RIP: 0010:size_store+0x7c2/0x910 [libnvdimm] 
[ 2433.123290] Call Trace: 
[ 2433.123297]  dev_attr_store+0x1b/0x25 
[ 2433.123302]  sysfs_kf_write+0x3f/0x46 
[ 2433.123304]  kernfs_fop_write+0x124/0x1a3 
[ 2433.123309]  __vfs_write+0x3a/0x16d 
[ 2433.123312]  ? __fd_install+0x31/0xce 
[ 2433.123314]  vfs_write+0xb2/0x1a1 
[ 2433.123318]  ? syscall_trace_enter+0x1ce/0x2b8 
[ 2433.123320]  SyS_write+0x55/0xb9 
[ 2433.123321]  do_syscall_64+0x79/0x1b9 
[ 2433.123325]  entry_SYSCALL_64_after_hwframe+0x169/0x0

This script can reproduce this issue.

#!/bin/bash

set -x 

# create/destroy namespace yields kernel warning messages
#######################################################

BASEDIR=$(dirname $(realpath $0))
TESTNAME=$(basename $0 .sh)

NDCTL=$(which ndctl)
LOGDIR=$BASEDIR/logdir
rm -rf $LOGDIR
mkdir -p $LOGDIR

NdctlCreate()
{
   local region=$1 
   while true; do
      SetSize=$(($RANDOM % 5 + 1))
      cmd_create="$NDCTL create-namespace --region=$region --mode=devdax --size=${SetSize}G"
      eval $cmd_create >> $LOGDIR/${TESTNAME}.create.${region}.log 2>&1
      [ $? -ne 0 ] && echo "$cmd_create - FAIL" | tee -a $LOGDIR/${TESTNAME}.create.${region}.log
   done
}

NdctlDestroy()
{
   while true; do
     NamespaceList=$($NDCTL list -N | jq -r .[].dev | tr -s '\n' ' ')
     Date=$(date '+%H%M%S')
     if [ -n "$NamespaceList" ]; then
      echo "== $Date: destroy - $NamespaceList" | tee -a $LOGDIR/${TESTNAME}.destroy.log
      cmd_destroy="$NDCTL destroy-namespace all -f"
      eval $cmd_destroy >> $LOGDIR/${TESTNAME}.destroy.log 2>&1
     else
      echo "Empty - no namespace to destroy" | tee -a $LOGDIR/${TESTNAME}.destroy.log
     fi
   done
}

### MAIN
NdctlCreate region1 & PID1=$!
NdctlDestroy & PID2=$!

PanicDef=$(dmesg | grep  -c "Call Trace")
while [ $(jobs -p | wc -l) -gt 0 ]; do
    sleep 5
    CntPanic=$(dmesg | grep  -c "Call Trace")
    if [ $CntPanic -gt $PanicDef ]; then
       kill -TERM $PID1 $PID2
       dmesg | awk  '/cut here/ ,/end trace/' | tee -a $LOGDIR/${TESTNAME}.panic.log
    fi
done

Jul 10 '19 17:07 swdavenport

Hi Scott,

Is this with Dan's "libnvdimm: Fix async operations and locking" Patchset applied (on libnvdimm-pending)? I ran the script on the nfit_test bus and a qemu based ACPI.NFIT bus, with a libnvdimm-pending kernel, and couldn't reproduce it.

Jul 10 '19 18:07 stellarhopper

Ah, letting it run long enough on the qemu ACPI.NFIT bus, I did hit this:

[65958.613893] ------------[ cut here ]------------
[65958.614764] kernel BUG at mm/page_alloc.c:489!
[65958.615450] invalid opcode: 0000 [#1] SMP PTI
[65958.616382] CPU: 4 PID: 7551 Comm: ndctl Tainted: G           O      5.2.0-rc4+ #17
[65958.617692] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[65958.621558] RIP: 0010:set_pfnblock_flags_mask+0x8d/0xe0
[65958.625447] Code: c1 eb 37 83 e0 03 48 69 c0 40 07 00 00 48 03 04 dd a0 9d 7d 82 48 8b 58 60 48 39 da 73 0e 48 c7 c6 28 ae 33 82 e8 13 b4 fd ff <0f> 0b 48 03 58 70 48 39 da 73 e9 49 01 c9 b9 3f 00 00 00 44 29 c9
[65958.631655] RSP: 0018:ffffc9000124ba40 EFLAGS: 00010246
[65958.632450] RAX: 000000000000004a RBX: 0000000000858000 RCX: 0000000000000000
[65958.633587] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82339a90
[65958.635005] RBP: ffff8883122d3340 R08: 000059fda5e85e1e R09: 0000000000000000
[65958.636597] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000013ae00
[65958.637816] R13: 0000000000978000 R14: 007fff8000000000 R15: 0000000103e9daef
[65958.639000] FS:  00007f3df24f0600(0000) GS:ffff888318c00000(0000) knlGS:0000000000000000
[65958.640378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[65958.641366] CR2: 00007f9d4f4f8f60 CR3: 00000001738f6003 CR4: 00000000001606e0
[65958.642758] Call Trace:
[65958.643222]  memmap_init_zone_device+0x145/0x17f
[65958.643910]  devm_memremap_pages+0x4e7/0x6b0
[65958.644731]  dev_dax_probe+0xef/0x240 [device_dax]
[65958.645653]  really_probe+0xef/0x390
[65958.646212]  ? driver_allows_async_probing+0x50/0x50
[65958.646969]  driver_probe_device+0xb4/0x100
[65958.647622]  ? driver_allows_async_probing+0x50/0x50
[65958.648380]  bus_for_each_drv+0x78/0xc0
[65958.648977]  __device_attach+0xd9/0x160
[65958.649584]  bus_probe_device+0x9c/0xb0
[65958.650171]  device_add+0x410/0x670
[65958.650824]  __devm_create_dev_dax+0x110/0x190
[65958.651443]  __dax_pmem_probe+0x1ef/0x210 [dax_pmem_core]
[65958.652201]  dax_pmem_probe+0xc/0x20 [dax_pmem]
[65958.652832]  nvdimm_bus_probe+0x66/0x160 [libnvdimm]
[65958.653520]  really_probe+0xef/0x390
[65958.654023]  driver_probe_device+0xb4/0x100
[65958.654605]  device_driver_attach+0x4f/0x60
[65958.655187]  bind_store+0xc9/0x110
[65958.655654]  kernfs_fop_write+0xf0/0x1a0
[65958.656193]  vfs_write+0xba/0x1c0
[65958.656663]  ksys_write+0x5f/0xe0
[65958.657114]  do_syscall_64+0x60/0x240
[65958.657611]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[65958.658291] RIP: 0033:0x7f3df27fdfa8
[65958.658896] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 75 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[65958.661400] RSP: 002b:00007ffd0e047cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[65958.662430] RAX: ffffffffffffffda RBX: 000000000189d040 RCX: 00007f3df27fdfa8
[65958.663400] RDX: 0000000000000007 RSI: 000000000189d040 RDI: 0000000000000004
[65958.664355] RBP: 0000000000000007 R08: 0000000000000007 R09: 0000000000000006
[65958.665300] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[65958.667176] R13: 00007f3df24f0528 R14: 0000000000000001 R15: 000000000189d040
[65958.668136] Modules linked in: kmem xfs nfit_test(O) dax_pmem_compat(O) device_dax(O) nfit(O) nd_blk(O) dax_pmem(O) nd_pmem(O) dax_pmem_core(O) nd_btt(O) nd_e820(O) libnvdimm(O) nfit_test_iomap(O) [last unloaded: nfit_test_iomap]
[65958.670883] ---[ end trace ecaeee5608fb71ff ]---

Jul 10 '19 18:07 stellarhopper

and to answer your previous question, yes, this panic has been seen with the "async operations" patch set in libnvdimm-pending.

Jul 15 '19 14:07 swdavenport

@swdavenport are you still seeing the warning with the v5.3-rc3 kernel? The "kernel BUG at mm/page_alloc.c" issue seems to be unrelated, but nonetheless concerning. Just checking if there are still 2 issues to debug, or just the one.

Aug 09 '19 21:08 djbw

@djbw The "kernel BUG at mm/page_alloc.c" issue was seen when running the above script in two separate terminals simultaneously. You're correct, that was a different issue which I can confirm is fixed. With that, I'm seeing 400+ seg faults after running them for 2+hours. The backtraces are the same, here is a sample, perhaps ndctl could use some checking?

(gdb) bt
#0  0x00007f224658a730 in ndctl_pfn_get_bus () from /lib64/libndctl.so.6
#1  0x00007f224658a749 in ndctl_pfn_get_ctx () from /lib64/libndctl.so.6
#2  0x00007f224658ab87 in ndctl_pfn_set_uuid () from /lib64/libndctl.so.6
#3  0x000000000040b188 in setup_namespace ()
#4  0x000000000040bf24 in do_xaction_namespace ()
#5  0x000000000040c338 in cmd_create_namespace ()
#6  0x000000000041b10b in main_handle_internal_command ()
#7  0x0000000000408788 in main ()
(gdb) p $_siginfo.si_signo
$1 = 11
(gdb) p $_siginfo.si_code
$3 = 1
(gdb)  p $_siginfo._sifields._sigfault.si_addr
$4 = (void *) 0x8
(gdb) info proc
exe = '/usr/bin/ndctl create-namespace --region=region1 --mode=devdax --size=3G'
(gdb) info r
rdx            0x16695e8        23500264
rsi            0x7fff11ed20e0   140733494141152
rdi            0x0      0 
rip            0x7f224658a730   0x7f224658a730 <ndctl_pfn_get_bus>
eflags         0x10206  [ PF IF RF ]
(gdb) disass ndctl_pfn_get_bus,+0x20
Dump of assembler code from 0x7f224658a730 to 0x7f224658a750:
=> 0x00007f224658a730 <ndctl_pfn_get_bus+0>:    mov    0x8(%rdi),%rax 
   0x00007f224658a734 <ndctl_pfn_get_bus+4>:    mov    0x8(%rax),%rax
   0x00007f224658a738 <ndctl_pfn_get_bus+8>:    retq

The size_store() issue is still there which can be seen by running the script in one terminal.

Aug 23 '19 22:08 jchu314atgithub

There has been some development in stabilizing the kernel wrt pmem operations. Please test with a recent kernel.

Mar 09 '22 20:03 hramrach