dcache
dcache copied to clipboard
nfs hangs with mv or rm
Hey.
Since #4622 remains unfixed I wrote something that's manually creating a JSON, which is then uploaded via a cron job:
rm -f /pnfs/lrz-muenchen.de/data/atlas/meta/storage-descriptor.json.new
if [ $? -ne 0 ]; then
printf 'Error: Could not remove old SRR data.\n' >&2
exit 1
fi
srr_script /pnfs/lrz-muenchen.de/data/atlas/meta/storage-descriptor.json.new
if [ $? -ne 0 ]; then
printf 'Error: Could not generate SRR data.\n' >&2
exit 1
fi
mv -f /pnfs/lrz-muenchen.de/data/atlas/meta/storage-descriptor.json.new /pnfs/lrz-muenchen.de/data/atlas/meta/storage-descriptor.json
if [ $? -ne 0 ]; then
printf 'Error: Could not replace previous SRR data.\n' >&2
exit 1
fi
I've seen for both, the rm and the mv the process to hang, the nfs domain, when tried to be killed, also freezes and becomes a zombie process.
The kernel (5.10.92) gives:
Apr 11 15:16:48 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:16:48 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:16:48 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:21:01 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:21:01 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:21:01 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:25:19 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:25:19 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:25:19 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:25:26 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:25:26 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:25:26 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:25:42 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:12: Session trunking failed for 129.187.131.18
Apr 11 15:25:42 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:25:42 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:29:51 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:12: Session trunking failed for 129.187.131.18
Apr 11 15:29:51 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:29:51 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:33:04 lcg-lrz-dcache0 kernel: nfs: server localhost not responding, still trying
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 120 seconds.
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:33:35 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 241 seconds.
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:35:35 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:35:36 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 362 seconds.
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:37:36 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 483 seconds.
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:39:37 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 604 seconds.
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:41:38 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 724 seconds.
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:43:39 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 845 seconds.
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:45:40 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 966 seconds.
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:47:40 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:47:41 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 1087 seconds.
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004000
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:49:41 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: INFO: task OncRpcSvc (34):32001 blocked for more than 1208 seconds.
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: Not tainted 5.10.0-11-amd64 #1 Debian 5.10.92-1
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: task:OncRpcSvc (34) state:D stack: 0 pid:32001 ppid: 1 flags:0x00004004
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: Call Trace:
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: __schedule+0x282/0x870
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: schedule+0x46/0xb0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: rwsem_down_read_slowpath+0x18e/0x500
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: ? lookup_fast+0xf2/0x150
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: walk_component+0x145/0x1d0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: ? path_init+0x2bd/0x3e0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: path_lookupat+0x73/0x1c0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: filename_lookup+0xaa/0x1b0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: ? strncpy_from_user+0x4e/0x140
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: ? getname_flags.part.0+0x45/0x1a0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: vfs_statx+0x74/0x130
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: __do_sys_newlstat+0x39/0x70
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: ? exit_to_user_mode_prepare+0x6e/0x120
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: do_syscall_64+0x33/0x80
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: RIP: 0033:0x7f44b221c446
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: RSP: 002b:00007f42c31f0928 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: RAX: ffffffffffffffda RBX: 00007f42c31f2ab0 RCX: 00007f44b221c446
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: RDX: 00007f42c31f0970 RSI: 00007f42c31f0970 RDI: 00007f42c31f1ab0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: RBP: 00007f42c31f0a40 R08: 0000000000000001 R09: 00007f4298054d46
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: R10: 00000000005b8092 R11: 0000000000000246 R12: 00007f42c31f1ab0
Apr 11 15:51:42 lcg-lrz-dcache0 kernel: R13: 00007f4298054d61 R14: 00007f42c31f1af2 R15: 00007f4298054d62
Apr 11 15:53:52 lcg-lrz-dcache0 kernel: nfs: server localhost not responding, timed out
Apr 11 15:53:58 lcg-lrz-dcache0 kernel: nfs: server localhost OK
...killed the rm process here,... and stared the nfs domain again...
Apr 11 15:55:53 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:55:53 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:55:53 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 15:56:55 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 15:56:55 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 15:56:55 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
Apr 11 16:00:01 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 16:00:01 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 16:00:01 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
this time with mv.
Interestingly, while it hangs, I cannot ls /pnfs/lrz-muenchen.de/data/atlas/meta/ but I can ls other dirs.
Cheers, Chris.
Removing the old file first, instead of overwriting it with mv, makes the problem go away.
Though at least once, I saw the issue also just on the first rm.
Oh and even if it works then (because of the above workaround) I still get:
Apr 11 16:07:54 lcg-lrz-dcache0 kernel: NFS: 2001:4ca0:0:201:a:0:2:35: Session trunking failed for 129.187.131.53
Apr 11 16:07:54 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr ::1 not added
Apr 11 16:07:54 lcg-lrz-dcache0 kernel: RPC: rpc_clnt_test_xprt failed: -111 addr 127.0.0.1 not added
on every upload.
Hi Chris,
dcache can't update/overwrite existing files. On an other hand, mv should not trigger truncate and should remove existing file. Do you observe the same bebavior when you run mv by hand?
Hey.
dcache can't update/overwrite existing files
sure but I'd have expected that posix handles mv like a atomic delete/create?
And yes, it also hangs when I do it manually.
Interestingly though:
/pnfs/lrz-muenchen.de/data/atlas/meta# cp storage-descriptor.json x
/pnfs/lrz-muenchen.de/data/atlas/meta# cp /bin/bash y
/pnfs/lrz-muenchen.de/data/atlas/meta# l
total 1,3M
drwxr-xr-x 5 root root 512 Apr 13 22:38 .
drwxr-xr-x 12 10761 1307 512 Nov 16 2020 ..
-rw-r--r-- 1 root root 6,8k Apr 13 22:30 storage-descriptor.json
-rw-r--r-- 1 root root 6,8k Apr 13 22:38 x
-rwxr-xr-x 1 root root 1,3M Apr 13 22:38 y
root@lcg-lrz-dcache0:/pnfs/lrz-muenchen.de/data/atlas/meta# mv y x
mv: overwrite 'x'? y
hangs
^C
/pnfs/lrz-muenchen.de/data/atlas/meta# l
total 1,3M
drwxr-xr-x 4 root root 512 Apr 13 22:40 .
drwxr-xr-x 12 10761 1307 512 Nov 16 2020 ..
-rw-r--r-- 1 root root 6,8k Apr 13 22:30 storage-descriptor.json
-rwxr-xr-x 1 root root 1,3M Apr 13 22:38 x
the file does seem to be overwritten:
# hd x | head
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 03 00 3e 00 01 00 00 00 80 06 03 00 00 00 00 00 |..>.............|
00000020 40 00 00 00 00 00 00 00 88 ce 12 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 0b 00 40 00 1d 00 1c 00 |[email protected]...@.....|
00000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000050 40 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |@.......@.......|
00000060 68 02 00 00 00 00 00 00 68 02 00 00 00 00 00 00 |h.......h.......|
00000070 08 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 |................|
00000080 a8 02 00 00 00 00 00 00 a8 02 00 00 00 00 00 00 |................|
00000090 a8 02 00 00 00 00 00 00 1c 00 00 00 00 00 00 00 |................|