zfs
zfs copied to clipboard
RIP: 0010:nfsd4_process_open2+0x1201/0x14c0 [nfsd] after upgrading to zfs-dkms 2.0.5-1
System information
Type | Version/Name |
---|---|
Distribution Name | Archlinux |
Distribution Version | |
Linux Kernel | 5.10.46-1-lts |
Architecture | x86_64 |
ZFS Version | 2.0.5-1 |
SPL Version | 2.0.5-1 |
Describe the problem you're observing
juin 28 18:13:53 smicro kernel: ------------[ cut here ]------------ juin 28 18:13:53 smicro kernel: WARNING: CPU: 0 PID: 1955 at fs/nfsd/nfs4state.c:4973 nfsd4_process_open2+0x1201/0x14c0 [nfsd] juin 28 18:13:53 smicro kernel: Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_limit nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag> juin 28 18:13:53 smicro kernel: CPU: 0 PID: 1955 Comm: nfsd Tainted: P OE 5.10.46-1-lts #1 juin 28 18:13:53 smicro kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5 11/25/2013 juin 28 18:13:53 smicro kernel: RIP: 0010:nfsd4_process_open2+0x1201/0x14c0 [nfsd] juin 28 18:13:53 smicro kernel: Code: f9 ff ff 4c 89 cf e8 7e 2e fe ff 49 89 c1 e9 39 f9 ff ff 49 83 7f 70 00 0f 84 01 01 00 00 41 83 47 78 01 31 d2 e9 8e f9 ff ff <0f> 0b e9 c5 f3 ff ff 0f b6 d4 89 d2 48 0f a3> juin 28 18:13:53 smicro kernel: RSP: 0018:ffffad50c94fbcf0 EFLAGS: 00010246 juin 28 18:13:53 smicro kernel: RAX: 0000000000000000 RBX: ffff95b5e77511e0 RCX: ffff95a7398137b0 juin 28 18:13:53 smicro kernel: RDX: 0000000000000001 RSI: ffff95a739929648 RDI: ffff95a7398da5a4 juin 28 18:13:53 smicro kernel: RBP: ffffad50c94fbdc8 R08: ffff95a7398da5a8 R09: 0000000000000000 juin 28 18:13:53 smicro kernel: R10: ffff95a739875688 R11: ffff95a739918200 R12: ffff95a739929648 juin 28 18:13:53 smicro kernel: R13: ffff95a7398da5a0 R14: ffff95a7398137b0 R15: ffff95a7398da5a0 juin 28 18:13:53 smicro kernel: FS: 0000000000000000(0000) GS:ffff95b55fc00000(0000) knlGS:0000000000000000 juin 28 18:13:53 smicro kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 juin 28 18:13:53 smicro kernel: CR2: 00007ff832ee28d4 CR3: 0000000374e10000 CR4: 00000000000006f0 juin 28 18:13:53 smicro kernel: Call Trace: juin 28 18:13:53 smicro kernel: ? nfsd_permission+0x63/0xe0 [nfsd] juin 28 18:13:53 smicro kernel: ? fh_verify+0x1d5/0x630 [nfsd] juin 28 18:13:53 smicro kernel: nfsd4_open+0x3be/0x750 [nfsd] juin 28 18:13:53 smicro kernel: nfsd4_proc_compound+0x494/0x6d0 [nfsd] juin 28 18:13:53 smicro kernel: nfsd_dispatch+0xd3/0x180 [nfsd] juin 28 18:13:53 smicro kernel: svc_process_common+0x3d7/0x6d0 [sunrpc] juin 28 18:13:53 smicro kernel: ? nfsd_svc+0x310/0x310 [nfsd] juin 28 18:13:53 smicro kernel: svc_process+0xb7/0xf0 [sunrpc] juin 28 18:13:53 smicro kernel: nfsd+0xe8/0x140 [nfsd] juin 28 18:13:53 smicro kernel: ? nfsd_destroy+0x60/0x60 [nfsd] juin 28 18:13:53 smicro kernel: kthread+0x11b/0x140 juin 28 18:13:53 smicro kernel: ? kthread_associate_blkcg+0xa0/0xa0 juin 28 18:13:53 smicro kernel: ret_from_fork+0x22/0x30 juin 28 18:13:53 smicro kernel: ---[ end trace 299b1c52e1b7ef74 ]---
Describe how to reproduce the problem
using nfs4, we have home directories and a shared Documents hierarchy for aarch64 linux clients. samba too.
Never seen this before. Linux-lts was upgraded the 26/06 and zfs-{dkms,util} today.
After reboot, I immediately came across the dump in dmesg/journalctl. It can only be related to the zfs upgrade from 2.0.4 to 2.0.5.
Include any warning/errors/backtraces from the system logs
Is there a message before or after the trace that might be informative about the nature of the problem? Usually Linux prints something about why it's upset when these come out, not solely the function it's upset in and a backtrace.
Immediate before I see:
juin 28 18:13:21 smicro systemd[1]: Finished NFS server and services.
juin 28 18:13:21 smicro kernel: NFSD: Using nfsdcld client tracking operations.
juin 28 18:13:21 smicro kernel: NFSD: starting 90-second grace period (net f0000098)
juin 28 18:13:21 smicro kernel: audit: type=1130 audit(1624896801.643:75): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nfs-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termina>
juin 28 18:13:21 smicro audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nfs-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
juin 28 18:13:21 smicro smbd[1946]: [2021/06/28 18:13:21.751135, 0] ../../lib/param/loadparm.c:1047(lpcfg_service_ok)
juin 28 18:13:21 smicro smbd[1946]: WARNING: No path in service IPC$ - making it unavailable!
juin 28 18:13:21 smicro smbd[1946]: [2021/06/28 18:13:21.811804, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
juin 28 18:13:21 smicro systemd[1]: Started Samba SMB Daemon.
juin 28 18:13:21 smicro smbd[1946]: daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
juin 28 18:13:21 smicro audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=smb comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
juin 28 18:13:21 smicro kernel: audit: type=1130 audit(1624896801.803:76): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=smb comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res>
juin 28 18:13:21 smicro systemd[1]: Starting ZFS file system shares...
juin 28 18:13:21 smicro systemd[1]: Finished ZFS file system shares.
juin 28 18:13:21 smicro audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=zfs-share comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
juin 28 18:13:21 smicro systemd[1]: Reached target ZFS startup target.
juin 28 18:13:21 smicro systemd[1]: Reached target Multi-User System.
juin 28 18:13:21 smicro systemd[1]: Reached target Graphical Interface.
juin 28 18:13:21 smicro systemd[1]: Starting Notify NFS peers of a restart...
juin 28 18:13:21 smicro sm-notify[1963]: Version 2.5.4 starting
juin 28 18:13:21 smicro systemd[1]: Started Notify NFS peers of a restart.
juin 28 18:13:21 smicro systemd[1]: Startup finished in 3.973s (kernel) + 41.087s (userspace) = 45.060s.
juin 28 18:13:21 smicro audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
juin 28 18:13:29 smicro nmbd[1922]: [2021/06/28 18:13:29.833959, 0] ../../source3/nmbd/nmbd_become_dmb.c:112(become_domain_master_stage2)
juin 28 18:13:29 smicro nmbd[1922]: *****
juin 28 18:13:29 smicro nmbd[1922]:
juin 28 18:13:29 smicro nmbd[1922]: Samba server SMICRO is now a domain master browser for workgroup BAOU on subnet 192.168.0.1
juin 28 18:13:29 smicro nmbd[1922]:
juin 28 18:13:29 smicro nmbd[1922]: *****
juin 28 18:13:44 smicro nmbd[1922]: [2021/06/28 18:13:44.855436, 0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2)
juin 28 18:13:44 smicro nmbd[1922]: *****
juin 28 18:13:44 smicro nmbd[1922]:
juin 28 18:13:44 smicro nmbd[1922]: Samba name server SMICRO is now a local master browser for workgroup BAOU on subnet 192.168.0.1
juin 28 18:13:44 smicro nmbd[1922]:
juin 28 18:13:44 smicro nmbd[1922]: *****
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: Global time limit set to 120000 milliseconds.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: Global size limit set to 104857600 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: File size limit set to 26214400 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: Recursion level limit set to 16.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: Files limit set to 10000.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxPartitions limit set to 50.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxIconsPE limit set to 100.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: MaxRecHWP3 limit set to 16.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: PCREMatchLimit limit set to 100000.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: PCRERecMatchLimit limit set to 2000.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Limits: PCREMaxFileSize limit set to 26214400.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Archive support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> AlertExceedsMax heuristic detection disabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Heuristic alerts enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Portable Executable support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> ELF support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Mail files support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> OLE2 support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> PDF support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> SWF support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> HTML support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> XMLDOCS support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> HWP3 support enabled.
juin 28 18:13:46 smicro clamd[1285]: Mon Jun 28 18:13:46 2021 -> Self checking every 600 seconds.
juin 28 18:13:48 smicro systemd-resolved[1597]: Clock change detected. Flushing caches.
and the line just after the traceback:
juin 28 18:13:54 smicro kernel: NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
I don't believe anything really interesting...
Nope. Darn, guess it's bisect and spelunking time.
Just for info, I came across this again .. this time on zfs 2.1.0 under 5.10.48-1-lts The traceback is quasi identical.
So, I tried this with 5.10.48 and 2.0.5 on Debian bullseye x86_64 (because it's what my testbed was running), and prodding it different ways with various x86_64 (and one sparc64) clients (creating then running various builds on a chroot under it, for instance) didn't cause it to emit any exciting noises.
Can you give any insight into the workloads that might be going on when this is happening, beyond just document storage? Are the Samba clients potentially modifying the same things at the same time as the NFS clients sometimes? (Are the Samba clients Windows, Linux, something else entirely?)
I didn't obviously see anything I would expect to provoke fire in the 2.0.4->2.0.5 changelog (which I picked since it's much shorter than the -> 2.1.0 changelog), but I am pretty far from an all-knowing observer of the code involved. Do you know what kernel you were running before this started?
Sorry, Rich, hard week... was running 5.10.45-1 prior to the linux-lts update on the 26/06/21. We share the same tree on smb and nfs4... Samba is for the m$ doze autocad workstations... linux clients are nfs (but may need to go samba if the nfs4 acls aren't fixed soon) Looking at the datetimes of the tracebacks, there was probably primarily an smb user involved. Normally simple documents are shared (libreoffice and pdfs), autocad only amongst the dozers. home directories also shared, but only used by nfs clients not smb.
just upgraded to linux-lts-5.10.51-1 with everything quiet on the smb side (nobody logged in), only me ssh'ing in to the server from my aarch64 client using nfs for home and whatnot. After the reboot logged back into the server and saw the same traceback as above... as such, can't hold anything against smb/samba.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.