Charles Hedrick
Charles Hedrick
The reason I'm suspicious is that this code sequence should only happen when the nfs operation is for a directory, and only if the directory is unconnected in the dcache,...
(The patch, by the way, turns off the prefetch in zfs_readdir) I'm running it on a staff server. Disabling prefetch doesn't seem to affect how long du -s takes. I'd...
It's brief enough that I can include it here: ``` diff --git a/module/os/linux/zfs/zfs_vnops_os.c b/module/os/linux/zfs/zfs_vnops_os.c index be528f6e8..bcaae6603 100644 --- a/module/os/linux/zfs/zfs_vnops_os.c +++ b/module/os/linux/zfs/zfs_vnops_os.c @@ -1498,6 +1498,7 @@ out: * We use 0...
the patch doesn't really help. Looking at both behavior and stack dumps, it's pretty clear that zfs_readdir is in an infinite or nearly infinite loop. It could be something that...
The only loops in the backtrace are zfs_readdir, get_name, and reconnect_one. I don't think zfs_readdir could fail to terminate without generating illegal pointers, etc. The most likely seems to be...
More info. I found that the directory that was triggering this has 200,000 subdirectories. Given the way reconnect_one and readdir work, that results in a lot of work. So for...
I believe I can reproduce the issue. It took a while because of cache effects. On a client, create 200,000 subdirectories in a directory. Copy a shortish file into each...
Here's how we think it happens: Oor clients have very large memory. Data stays cached. Let suppose that baddir has 200,000 subdirectories. (That's our actual situation.) If you do a...