zfs
zfs copied to clipboard
high CPU for NFS
System information
Type | Version/Name Ubuntu | 22.04 Distribution Name | Ubuntu Distribution Version | 22.04 Kernel Version | 6.5 Architecture | x86_64 OpenZFS Version | 2.2.4
Describe the problem you're observing
We have periodic very high CPU , with many nfsds at 100%.
During this time, nfs operations are moderate, in the range of 10K / sec. However arcstat shows 2 M reads / sec, half demand metadata and half prefetch.
A backtrace shows:
un 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.953866] spl_kmem_free+0x31/0x40 [spl] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.953879] dbuf_issue_final_prefetch_done+0x49/0x60 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954001] arc_read+0xdfa/0x1790 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954114] ? __pfx_dbuf_issue_final_prefetch_done+0x10/0x10 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954225] dbuf_issue_final_prefetch+0xa7/0x100 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954327] dbuf_prefetch_impl+0x779/0xa70 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954437] dbuf_prefetch+0x13/0x30 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954541] dmu_prefetch_dnode.part.0+0x47/0xa0 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954646] dmu_prefetch_dnode+0x30/0x40 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954753] zfs_readdir+0x369/0x560 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954876] zpl_iterate+0x54/0x90 [zfs] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954983] iterate_dir+0xa9/0x180 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954988] get_name+0x15e/0x1d0 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.954994] ? __pfx_filldir_one+0x10/0x10 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.955000] reconnect_one+0x242/0x280 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.955002] reconnect_path+0xfa/0x120 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.955005] ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd] Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.955034] exportfs_decode_fh_raw+0x12e/0x340 Jun 7 10:55:43 eternal.lcsr.rutgers.edu kernel: [351663.955043] nfsd_set_fh_dentry+0x2d5/0x490 [nfsd]
They're all in reconnect_one, doing dbuf stuff. The number of reads seems unreasonable.
Our experience is that it will eventually calm down, but then start up again. Basically, after a few weeks uptime, usage peaks get higher and higher, and low usage less and less common, until we reboot. I've been trying different kernels, but I'm not sure whether this is related to the kernel or ZFS. (However the time element may not be true. I recentrly rebooted with the problem was occurring and it immediately started again. I now think it's a specific pattern of accesses from the client.)
Chris Siebenmann of Toronto has kindly looked at it and suggested a cause.
nfsd has to validate the file id's it gets from the client. In the case of directories, this often involves going up the directory tree. At each level it checks every node until it comes to the one below it. In zfs_readdir, every time a node is looked at, its metadata is prefetched, even if it's already in the ARC. Of course the prefetch doesn't do any I/O if it's cached, but there's still a fair amount of code. (We have 3/4 TB of memory and very high cache hit rates. Even if I/O is needed, our metadata is in an NVMe-based special.)
He has proposed a patch that allows us to disable this prefetching. We'll be testing this over the next week. However a better solution would be to avoid the prefetch in the specific situation of the nfsd reconnect_path, if that is possible to detect. (I believe a zfs-specific version of getname could be used.)
I'll report here the results of disabling the prefetch, but it may take a few weeks to assess.
Describe how to reproduce the problem.
not reproducible