fd icon indicating copy to clipboard operation
fd copied to clipboard

Switch from std::sync::mpsc to flume

Open tavianator opened this issue 3 years ago • 6 comments

From chatter on Zulip, it looks like flume is a candidate to replace std::sync::mpsc if it is not deprecated. This is an experiment to try it instead of std or crossbeam-channels.

My benchmarking indicates that flume is faster than std, but not quite as fast as crossbeam. I'm curious @sharkdp if you still see a perf regression with this implementation?

tavianator avatar Jan 11 '22 18:01 tavianator

I ran some benchmarks comparing master, this branch, and using crossbeam-channel.

For my photos directory (on spinning disk):

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/Pictures' 10.5 ± 3.3 5.4 20.6 1.00
./fd-flume --hidden --no-ignore '' '/home/thayne/bulk-home/Pictures' 10.8 ± 3.1 5.8 19.0 1.03 ± 0.44
./fd-crossbeam --hidden --no-ignore '' '/home/thayne/bulk-home/Pictures' 11.6 ± 3.9 5.0 22.3 1.10 ± 0.51

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 11.3 ± 3.5 5.8 22.2 1.00
./fd-flume '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 11.7 ± 3.3 7.2 22.5 1.04 ± 0.44
./fd-crossbeam '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 13.4 ± 4.6 6.9 26.6 1.19 ± 0.55

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 10.7 ± 3.5 5.8 21.9 1.03 ± 0.46
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 10.4 ± 3.1 6.1 24.1 1.00
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 11.5 ± 4.0 5.1 23.7 1.10 ± 0.51

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/Pictures' 14.8 ± 3.5 9.7 24.5 1.01 ± 0.31
./fd-flume -HI --extension jpg '' '/home/thayne/bulk-home/Pictures' 14.7 ± 2.9 9.8 24.6 1.00
./fd-crossbeam -HI --extension jpg '' '/home/thayne/bulk-home/Pictures' 15.0 ± 3.8 9.5 27.6 1.02 ± 0.33

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/thayne/bulk-home/Pictures' 10.7 ± 3.8 5.2 21.5 1.06 ± 0.56
./fd-flume -HI --type l '' '/home/thayne/bulk-home/Pictures' 10.1 ± 3.9 3.7 21.6 1.00
./fd-crossbeam -HI --type l '' '/home/thayne/bulk-home/Pictures' 11.3 ± 3.9 5.0 20.1 1.12 ± 0.58

Cold cache

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 112.4 ± 69.8 71.1 193.0 1.75 ± 1.18
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 70.8 ± 28.1 57.1 144.3 1.11 ± 0.53
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/Pictures' 64.1 ± 16.8 55.5 111.5 1.00

flume and std seem to be pretty close, in most tests, crossbeam seems to be a little bit slower. On the cold cache, master was significantly slower than both, and crossbeam was fastest. but maybe the reset cache command wasn't working as expected, and the order mattered? Does hyperfine run all the tests for the first command before doing the second, or does it intersperse them?

Haproxy repository (SSD) order 1

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam --hidden --no-ignore '' '/home/thayne/dev/haproxy' 10.4 ± 3.5 5.8 22.6 1.02 ± 0.47
./fd-flume --hidden --no-ignore '' '/home/thayne/dev/haproxy' 10.1 ± 3.1 6.0 21.6 1.00
./fd-master --hidden --no-ignore '' '/home/thayne/dev/haproxy' 10.6 ± 3.3 5.3 19.8 1.05 ± 0.45

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 8.8 ± 4.0 3.0 16.8 1.00
./fd-flume '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 8.9 ± 4.0 2.4 16.6 1.00 ± 0.64
./fd-master '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 9.4 ± 4.0 3.3 17.1 1.06 ± 0.65

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 12.2 ± 4.1 6.4 22.5 1.02 ± 0.50
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 11.9 ± 4.2 6.1 21.6 1.00
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 12.2 ± 4.5 6.0 22.1 1.03 ± 0.52

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam -HI --extension jpg '' '/home/thayne/dev/haproxy' 12.1 ± 3.7 6.9 21.4 1.00
./fd-flume -HI --extension jpg '' '/home/thayne/dev/haproxy' 13.1 ± 4.3 7.0 24.1 1.09 ± 0.49
./fd-master -HI --extension jpg '' '/home/thayne/dev/haproxy' 13.2 ± 4.3 7.2 24.1 1.09 ± 0.49

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam -HI --type l '' '/home/thayne/dev/haproxy' 11.9 ± 4.5 5.7 24.1 1.08 ± 0.56
./fd-flume -HI --type l '' '/home/thayne/dev/haproxy' 11.0 ± 3.9 5.6 21.6 1.00
./fd-master -HI --type l '' '/home/thayne/dev/haproxy' 11.6 ± 4.2 5.9 22.2 1.05 ± 0.53

Cold cache

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 123.6 ± 39.0 103.4 193.2 1.14 ± 0.37
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 128.1 ± 41.3 108.2 212.2 1.18 ± 0.39
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 108.5 ± 9.1 101.7 129.4 1.00

haproxy repository (SSD) order 2

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/thayne/dev/haproxy' 9.1 ± 2.9 5.5 18.5 1.00
./fd-flume --hidden --no-ignore '' '/home/thayne/dev/haproxy' 9.6 ± 3.1 5.5 21.2 1.06 ± 0.48
./fd-crossbeam --hidden --no-ignore '' '/home/thayne/dev/haproxy' 9.8 ± 3.4 5.5 20.8 1.08 ± 0.51

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 8.5 ± 3.8 1.4 16.0 1.04 ± 0.67
./fd-flume '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 8.4 ± 3.9 1.8 15.8 1.03 ± 0.68
./fd-crossbeam '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 8.1 ± 3.8 1.3 15.5 1.00

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 11.1 ± 3.4 6.9 21.3 1.00
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 11.8 ± 4.1 6.9 22.8 1.06 ± 0.49
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 13.5 ± 4.2 7.3 24.5 1.21 ± 0.53

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/thayne/dev/haproxy' 10.5 ± 4.2 4.6 20.7 1.00
./fd-flume -HI --extension jpg '' '/home/thayne/dev/haproxy' 11.1 ± 3.9 5.9 21.7 1.05 ± 0.56
./fd-crossbeam -HI --extension jpg '' '/home/thayne/dev/haproxy' 11.6 ± 4.3 5.0 20.9 1.10 ± 0.59

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/thayne/dev/haproxy' 12.8 ± 4.1 7.0 22.0 1.04 ± 0.47
./fd-flume -HI --type l '' '/home/thayne/dev/haproxy' 12.4 ± 3.9 6.5 22.3 1.00
./fd-crossbeam -HI --type l '' '/home/thayne/dev/haproxy' 13.0 ± 4.4 7.3 26.6 1.05 ± 0.49

Cold cache

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 132.1 ± 25.5 118.4 177.7 1.20 ± 0.26
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 110.2 ± 10.6 101.6 134.8 1.00
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/dev/haproxy' 112.4 ± 3.4 108.0 116.4 1.02 ± 0.10

Rust-lang repository (spinning disk)

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 254.5 ± 2.1 251.7 259.7 1.01 ± 0.01
./fd-flume --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 264.0 ± 2.4 259.5 267.4 1.05 ± 0.01
./fd-crossbeam --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 251.2 ± 2.7 247.3 255.5 1.00

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 346.4 ± 2.6 341.9 351.0 1.00
./fd-flume '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 347.9 ± 11.8 337.7 373.0 1.00 ± 0.03
./fd-crossbeam '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 354.4 ± 8.7 344.8 367.6 1.02 ± 0.03

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 245.1 ± 3.4 240.7 253.1 1.00
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 249.9 ± 5.4 246.1 264.5 1.02 ± 0.03
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 246.5 ± 4.8 241.4 256.7 1.01 ± 0.02

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 253.7 ± 4.0 249.3 260.8 1.00
./fd-flume -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 254.6 ± 2.9 250.5 259.5 1.00 ± 0.02
./fd-crossbeam -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 254.6 ± 3.5 248.2 261.7 1.00 ± 0.02

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 243.7 ± 2.4 239.5 248.0 1.00
./fd-flume -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 247.4 ± 2.5 245.0 254.2 1.02 ± 0.01
./fd-crossbeam -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 249.5 ± 2.3 246.7 253.4 1.02 ± 0.01

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 29.077 ± 0.413 28.751 29.542 1.00 ± 0.02
./fd-flume -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 28.985 ± 0.155 28.849 29.153 1.00
./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 29.034 ± 0.238 28.760 29.183 1.00 ± 0.01

I'm not entirely sure if the differences are do to actual performance differences, or something wiht how I'm running the benchmarks.

tmccombs avatar Jan 13 '22 07:01 tmccombs

Thank you for looking into this again @tavianator. And thank you for the benchmark results, @tmccombs.

but maybe the reset cache command wasn't working as expected, and the order mattered? Does hyperfine run all the tests for the first command before doing the second, or does it intersperse them?

It does run all the benchmarks for the first command before doing the second. See also https://github.com/sharkdp/hyperfine/issues/21

The general problem with your benchmarks is the large statistical noise. Look at the very first benchmark, for example. A result like 1.03 ± 0.44 for flume (with respect to master) means: flume was 3% slower, but there is a statistical uncertainty of 44 percentage points, i.e. the error is an order of magnitude larger than the measured effect. Maybe hyperfine should come with a big warning in a case like this. That 3% performance benefit result really shouldn't be trusted.

Compare that to the "No pattern" benchmark on my machine (note: this is on a larger folder, to increase signal-to-noise even more):

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/shark/Informatik/' 385.0 ± 0.9 383.3 386.2 1.00
./fd-flume --hidden --no-ignore '' '/home/shark/Informatik/' 402.1 ± 1.8 400.0 406.2 1.04 ± 0.01

Here, the statistical error (0.01) is much quite a bit slower than the effect we are seeing (0.04).

It's annoying, but it's really important to switch off background processes. Especially the ones that might be reading from / writing to disk. Largest offenders for me are typically: dropbox(!), spotify, the browser.


Please find the full benchmark results from my machine (solid state disk) below. I would summarize them as: there is no statistically significant difference between the master version and the version from this branch, except for the "no pattern" benchmark, where the flume-version is 4% slower (reproducibly).

@tavianator Could you maybe also share your benchmark results?

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/shark/Informatik/' 385.0 ± 0.9 383.3 386.2 1.00
./fd-flume --hidden --no-ignore '' '/home/shark/Informatik/' 402.1 ± 1.8 400.0 406.2 1.04 ± 0.01

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/shark/Informatik/' 177.3 ± 1.1 175.9 179.2 1.00 ± 0.01
./fd-flume '.*[0-9]\.jpg$' '/home/shark/Informatik/' 176.7 ± 0.9 175.0 179.5 1.00

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/' 361.7 ± 1.6 359.0 364.2 1.01 ± 0.01
./fd-flume -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/' 358.7 ± 2.1 356.2 361.8 1.00

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/shark/Informatik/' 392.7 ± 9.0 379.9 406.6 1.01 ± 0.03
./fd-flume -HI --extension jpg '' '/home/shark/Informatik/' 388.7 ± 6.6 383.0 400.2 1.00

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/shark/Informatik/' 359.2 ± 0.8 357.9 360.7 1.01 ± 0.01
./fd-flume -HI --type l '' '/home/shark/Informatik/' 357.2 ± 2.2 353.8 360.2 1.00

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/' 3.008 ± 0.017 2.995 3.027 1.00 ± 0.01
./fd-flume -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/' 3.007 ± 0.015 2.994 3.023 1.00

sharkdp avatar Jan 23 '22 19:01 sharkdp

Interesting. The no-pattern case would probably fill up the queues more. This seems to contradict the benchmarks in flume's README. Is it worth bring up with flume and/or crossbeam-channel that you aren't seeing the performance gains that they claim over mpsc?

tmccombs avatar Jan 26 '22 08:01 tmccombs

Here's my results. Weirdly --extension is 7% slower with flume, but otherwise flume wins by 1-5%.

fd regression benchmark

No pattern

Command Mean [s] Min [s] Max [s] Relative
./fd-master --hidden --no-ignore '' '/home/tavianator/code/android' 2.069 ± 0.006 2.058 2.076 1.05 ± 0.01
./fd-feature --hidden --no-ignore '' '/home/tavianator/code/android' 1.973 ± 0.024 1.955 2.039 1.00

Simple pattern

Command Mean [s] Min [s] Max [s] Relative
./fd-master '.*[0-9]\.jpg$' '/home/tavianator/code/android' 1.370 ± 0.010 1.358 1.389 1.01 ± 0.01
./fd-feature '.*[0-9]\.jpg$' '/home/tavianator/code/android' 1.359 ± 0.003 1.355 1.365 1.00

Simple pattern (-HI)

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/tavianator/code/android' 2.046 ± 0.003 2.043 2.051 1.05 ± 0.01
./fd-feature -HI '.*[0-9]\.jpg$' '/home/tavianator/code/android' 1.946 ± 0.020 1.933 2.000 1.00

File extension

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI --extension jpg '' '/home/tavianator/code/android' 1.947 ± 0.006 1.938 1.956 1.00
./fd-feature -HI --extension jpg '' '/home/tavianator/code/android' 2.088 ± 0.032 2.065 2.153 1.07 ± 0.02

File type

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI --type l '' '/home/tavianator/code/android' 2.031 ± 0.004 2.025 2.038 1.05 ± 0.00
./fd-feature -HI --type l '' '/home/tavianator/code/android' 1.937 ± 0.005 1.932 1.948 1.00

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/tavianator/code/android' 2.178 ± 0.004 2.175 2.183 1.05 ± 0.00
./fd-feature -HI '.*[0-9]\.jpg$' '/home/tavianator/code/android' 2.076 ± 0.003 2.074 2.079 1.00

tavianator avatar Jan 26 '22 16:01 tavianator

@tavianator @tmccombs How do we proceed with this std-channels vs crossbeam vs flume topic? It feels to me like we need to work on #893 first. What do you think?

sharkdp avatar Mar 04 '22 07:03 sharkdp

I don't know. I'd like to have a better understanding of why switching to crossbeam-channel or flume doesn't perform as well in some cases. Either of you have much experience profiling rust code?

tmccombs avatar Mar 11 '22 04:03 tmccombs

I just found this by accident: https://github.com/fereidani/kanal

sharkdp avatar Oct 17 '22 19:10 sharkdp

@sharkdp Yeah I saw that recently too! Curious to try it out, might be a little early though

tavianator avatar Oct 17 '22 19:10 tavianator

I guess we can close this in favor of #1146

sharkdp avatar Oct 31 '22 20:10 sharkdp