fuse
fuse copied to clipboard
use as many goroutines for ForgetInode op
in #30 forgetinode was changed into inline ServerOps, this may solove memory oom, but the performance of rm op will be very slow, and it will also hang other ops, so i think add a goroutine pool to limit the max num of forgetinode goroutines, and it will not affect the performance
hi @stapelberg do you have some advise?
I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.
Using a very small pool maybe ok, note that implementation may have to lock anyway so having that many forget goroutines running isn't useful
I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.
hi, @kahing the 10w is the capacity, will not allocate until used, and if the goroutine idle too long time, it will be recycled!
Have you done benchmarks for this? Can you share the results?
you can see from the follow result, 4 tasks,1000files, the "File removal" performance up from 22 to 5175! Before modify
./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB Used FS: 37.9% Inodes: 25.0 Mi Used Inodes: 0.8%
4 tasks, 3996 files
SUMMARY rate: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation : 4530.794 4530.784 4530.788 0.004
File stat : 972973.342 972860.389 972935.691 53.246
File read : 17802.887 17802.792 17802.849 0.041
File removal : 22.489 22.489 22.489 0.000
Tree creation : 597.252 424.639 532.511 76.785
Tree removal : 58.258 16.345 35.345 17.333
-- finished at 03/18/2020 17:12:34 --
After modify
Command line used: ./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB Used FS: 37.9% Inodes: 25.0 Mi Used Inodes: 0.8%
4 tasks, 3996 files
SUMMARY rate: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation : 4469.286 4469.280 4469.283 0.002
File stat : 1013572.737 1012898.941 1013286.855 284.380
File read : 17506.221 17506.093 17506.136 0.060
File removal : 5175.368 5175.358 5175.363 0.004
Tree creation : 579.831 540.689 565.124 17.397
Tree removal : 447.424 427.932 435.875 8.356
-- finished at 03/18/2020 17:01:36 --
Can you share the results of that same benchmark, but with a goroutine pool of only 2 goroutines please?
2 goroutines
./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB Used FS: 37.8% Inodes: 25.0 Mi Used Inodes: 0.8%
4 tasks, 3996 files
SUMMARY rate: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation : 4578.542 4578.532 4578.538 0.004
File stat : 971056.708 970550.627 970813.018 207.029
File read : 18393.587 18393.405 18393.500 0.074
File removal : 46.243 46.243 46.243 0.000
Tree creation : 653.828 461.589 564.816 79.122
Tree removal : 71.337 27.775 45.803 18.560
100 goroutines
./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB Used FS: 37.9% Inodes: 25.0 Mi Used Inodes: 0.8%
4 tasks, 3996 files
SUMMARY rate: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation : 4580.740 4580.723 4580.733 0.008
File stat : 990979.648 990628.216 990842.974 153.729
File read : 18073.165 18073.068 18073.120 0.040
File removal : 2281.023 2281.021 2281.022 0.001
Tree creation : 624.400 446.060 558.440 79.864
Tree removal : 457.793 438.781 445.753 8.549
What we're seeing here is that the kernel will parallelize dispatch of operations that have no logical dependencies on each other. The test is looking for throughput of removals, while @stapelberg's optimization was looking for latency of close(2)
.
A better solution may be to keep a goroutine pool for all operations instead of spinning them up as requests come in, and then spinning up fresh goroutines only under high load if all pool goroutines are busy.