zfs
zfs copied to clipboard
Add module parameter to disable prefetch in zfs_readdir
Add paramter zfs_readdir_dnode_prefetch_enabled, defaulting to 1, to control whether zfs_readdir prefetched metadata for all objects it look at when reading a directory.
Setting it to 0 can be important for NFS servers with directories containing many subdirectories.
Motivation and Context
This improves https://github.com/openzfs/zfs/issues/16254, although a more radical redesign is possible. We have an NFS servers with directories containing 200,000 to 500,000 subdirectories. These arise in AI training scenarios. Our NFS servers can have 20 or more threads running at 100% for hours due to this.
The` problem has been found to be an interaction between the NFS code and zfs_readdir. If information about a directory is cached on the client but not the server, when the server tries to valid the NFS file id, it will add an entry in the dnode cache for the directory. Because he new dnode is not connected to the rest of the directory hierarchy, the NFS (actually exportfs) code goes up the tree trying to connect it. The ends up calling zfs_readdir to find the directory in its parent Because zfs_readdir reads the parent looking for the directory, the amount of time it takes is proportional to the size of the parent. If every directory is processed, the amount of time will be N**2 in the size of the parent. Reading a directory would have only moderate overhead except that zfs_readdir does a prefetch on every item as it looks at it, even if the item is already in the ARC. Of course if it's in the ARC, no disk I/O will be done. But there's still a substantial amount of code.
We've found a factor of 10 to 20 speedup by skipping the prefetches. This is enough to move us from a painful situation to one we can live with.
Of course a linear speedup in an N**2 problem isn't a full solution, but it's enough to survive the size directories we see in practice. A full solution would almost certainly require help from the kernel developers, which is going to be hard to get, since directory searching is fast enough for other Linux file systems that the problem only comes up with ZFS.
Description
This change add a module parameter, zfs_readdir_dnode_prefetch_enabled, normally 1, which can be changed to 0 to disable the prefetch that happens in zfs_readdir.
How Has This Been Tested?
We have done testing with a directory containing 200,000 subdirectories, with the results discussed above. It has been running in production for several weeks.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [x] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by
.