fuse icon indicating copy to clipboard operation
fuse copied to clipboard

use as many goroutines for ForgetInode op

Open kungf opened this issue 4 years ago • 7 comments

in #30 forgetinode was changed into inline ServerOps, this may solove memory oom, but the performance of rm op will be very slow, and it will also hang other ops, so i think add a goroutine pool to limit the max num of forgetinode goroutines, and it will not affect the performance

kungf avatar Mar 17 '20 03:03 kungf

hi @stapelberg do you have some advise?

kungf avatar Mar 17 '20 06:03 kungf

I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.

Using a very small pool maybe ok, note that implementation may have to lock anyway so having that many forget goroutines running isn't useful

kahing avatar Mar 17 '20 07:03 kahing

I don't think 100000 is the right number. Either we allocate all of them at startup which wastes 200MB for most cases, or we don't allocate them at startup in which case you have OOM during a storm of ForgetInode.

hi, @kahing the 10w is the capacity, will not allocate until used, and if the goroutine idle too long time, it will be recycled!

kungf avatar Mar 18 '20 02:03 kungf

Have you done benchmarks for this? Can you share the results?

you can see from the follow result, 4 tasks,1000files, the "File removal" performance up from 22 to 5175! Before modify

./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4530.794       4530.784       4530.788          0.004
   File stat         :     972973.342     972860.389     972935.691         53.246
   File read         :      17802.887      17802.792      17802.849          0.041
   File removal      :         22.489         22.489         22.489          0.000
   Tree creation     :        597.252        424.639        532.511         76.785
   Tree removal      :         58.258         16.345         35.345         17.333

-- finished at 03/18/2020 17:12:34 --

After modify

Command line used: ./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4469.286       4469.280       4469.283          0.002
   File stat         :    1013572.737    1012898.941    1013286.855        284.380
   File read         :      17506.221      17506.093      17506.136          0.060
   File removal      :       5175.368       5175.358       5175.363          0.004
   Tree creation     :        579.831        540.689        565.124         17.397
   Tree removal      :        447.424        427.932        435.875          8.356

-- finished at 03/18/2020 17:01:36 --

kungf avatar Mar 18 '20 09:03 kungf

Can you share the results of that same benchmark, but with a goroutine pool of only 2 goroutines please?

stapelberg avatar Mar 18 '20 16:03 stapelberg

2 goroutines

./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.8%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4578.542       4578.532       4578.538          0.004
   File stat         :     971056.708     970550.627     970813.018        207.029
   File read         :      18393.587      18393.405      18393.500          0.074
   File removal      :         46.243         46.243         46.243          0.000
   Tree creation     :        653.828        461.589        564.816         79.122
   Tree removal      :         71.337         27.775         45.803         18.560

100 goroutines

   ./mdtest "-n" "1000" "-u" "-z" "2" "-i" "3" "-F" "-d" "/home/wangyang5/polefs/mnt"
Path: /home/wangyang5/polefs
FS: 50.0 GiB   Used FS: 37.9%   Inodes: 25.0 Mi   Used Inodes: 0.8%

4 tasks, 3996 files

SUMMARY rate: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       4580.740       4580.723       4580.733          0.008
   File stat         :     990979.648     990628.216     990842.974        153.729
   File read         :      18073.165      18073.068      18073.120          0.040
   File removal      :       2281.023       2281.021       2281.022          0.001
   Tree creation     :        624.400        446.060        558.440         79.864
   Tree removal      :        457.793        438.781        445.753          8.549

kungf avatar Mar 19 '20 01:03 kungf

What we're seeing here is that the kernel will parallelize dispatch of operations that have no logical dependencies on each other. The test is looking for throughput of removals, while @stapelberg's optimization was looking for latency of close(2).

A better solution may be to keep a goroutine pool for all operations instead of spinning them up as requests come in, and then spinning up fresh goroutines only under high load if all pool goroutines are busy.

riking avatar Jul 05 '20 06:07 riking