findutils icon indicating copy to clipboard operation
findutils copied to clipboard

Leverage parallel capabilities

Open thecotne opened this issue 3 years ago • 11 comments

not sure if this does what it should be doing but tests are passing

fixes #112

thecotne avatar Feb 04 '22 07:02 thecotne

Sweet Could you please add a BENCHMARKING.md file like https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md with some examples?

Did you see some performance wins?

sylvestre avatar Feb 04 '22 08:02 sylvestre

Codecov Report

Merging #142 (836d06c) into main (58070d6) will increase coverage by 0.13%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
+ Coverage   52.98%   53.12%   +0.13%     
==========================================
  Files          24       24              
  Lines        4756     4757       +1     
  Branches     1548     1553       +5     
==========================================
+ Hits         2520     2527       +7     
+ Misses       1687     1675      -12     
- Partials      549      555       +6     
Impacted Files Coverage Δ
src/find/matchers/mod.rs 62.54% <ø> (ø)
src/find/mod.rs 63.33% <100.00%> (+1.53%) :arrow_up:
src/xargs/mod.rs 77.46% <0.00%> (-2.08%) :arrow_down:
src/lib.rs 36.36% <0.00%> (+0.49%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 58070d6...836d06c. Read the comment docs.

codecov[bot] avatar Feb 04 '22 08:02 codecov[bot]

Seems that it broke some BFS tests:

2022-02-04T08:16:27.7485162Z /home/runner/work/findutils/findutils/findutils/target/release/find basic \( -name foo -type d -o -name bar -a -type f \) -print \, \! -empty -type f -print 
2022-02-04T08:16:27.7548222Z test_print0
2022-02-04T08:16:27.7552535Z /home/runner/work/findutils/findutils/findutils/target/release/find basic/a basic/b -print0 
2022-02-04T08:16:27.7567667Z thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', src/find/matchers/printer.rs:42:52
2022-02-04T08:16:27.7568120Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-02-04T08:16:27.7585129Z test_print0 failed!

sylvestre avatar Feb 04 '22 08:02 sylvestre

i ran some benchmarks and this branch is slightly slower then main and both are faster then gnu find

$ hyperfine --warmup 3 'find ~ -name yarn.lock' './find ~ -name yarn.lock' './find-par ~ -name yarn.lock'
Benchmark 1: find ~ -name yarn.lock
  Time (mean ± σ):     798.2 ms ±   9.0 ms    [User: 332.7 ms, System: 465.3 ms]
  Range (min … max):   783.5 ms … 809.0 ms    10 runs

Benchmark 2: ./find ~ -name yarn.lock
  Time (mean ± σ):     608.6 ms ±  11.1 ms    [User: 236.1 ms, System: 372.5 ms]
  Range (min … max):   590.4 ms … 621.0 ms    10 runs

Benchmark 3: ./find-par ~ -name yarn.lock
  Time (mean ± σ):     618.9 ms ±   7.5 ms    [User: 224.5 ms, System: 395.0 ms]
  Range (min … max):   606.6 ms … 629.4 ms    10 runs

Summary
  './find ~ -name yarn.lock' ran
    1.02 ± 0.02 times faster than './find-par ~ -name yarn.lock'
    1.31 ± 0.03 times faster than 'find ~ -name yarn.lock'

$ hyperfine --warmup 3 'find ~ -name *.json' './find ~ -name *.json' './find-par ~ -name *.json'
Benchmark 1: find ~ -name *.json
  Time (mean ± σ):     836.2 ms ±   6.4 ms    [User: 374.1 ms, System: 461.8 ms]
  Range (min … max):   826.4 ms … 847.3 ms    10 runs

Benchmark 2: ./find ~ -name *.json
  Time (mean ± σ):     723.5 ms ±   9.4 ms    [User: 313.4 ms, System: 410.1 ms]
  Range (min … max):   710.3 ms … 735.8 ms    10 runs

Benchmark 3: ./find-par ~ -name *.json
  Time (mean ± σ):     741.5 ms ±   5.4 ms    [User: 339.3 ms, System: 402.8 ms]
  Range (min … max):   735.1 ms … 752.0 ms    10 runs

Summary
  './find ~ -name *.json' ran
    1.02 ± 0.02 times faster than './find-par ~ -name *.json'
    1.16 ± 0.02 times faster than 'find ~ -name *.json'
  • ./find is build from main branch
  • ./find-par is build from this branch
  • and find is gnu find

thecotne avatar Feb 04 '22 11:02 thecotne

slower, even on directories with plenty of files?

sylvestre avatar Feb 04 '22 14:02 sylvestre

730550 files to be exact

thecotne avatar Feb 04 '22 14:02 thecotne

Spinning rust, or SSD. I could see how on the former, parallel operations could end up just fighting over head seeks.

Mark

On Fri, Feb 4, 2022 at 2:24 PM Tsotne Nazarashvili @.***> wrote:

730550 files to be exact

— Reply to this email directly, view it on GitHub https://github.com/uutils/findutils/pull/142#issuecomment-1030035001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3MQF7GOHRUG5WPX76ABZ3UZPOQZANCNFSM5NRBOYTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

mcharsley avatar Feb 04 '22 15:02 mcharsley

i am testing on SSD

thecotne avatar Feb 05 '22 01:02 thecotne

ok, so, is there an advantage merging this ? (besides the fun of it ? :)

sylvestre avatar Feb 05 '22 13:02 sylvestre

no. in current state this branch is slightly more complex and slightly less efficient version so there is no point in merging it.

if you can give any advice on possible problems/solutions and/or what to investigate etc. to actually improve performance using parallelism i can try to do it. if not we can close this PR i guess

thecotne avatar Feb 05 '22 13:02 thecotne

I believe there's no performance benefit because with walkdir, all the filesystem traversal happens on a single thread. par_iter() then splits up the processing work between multiple threads, but the bottleneck is the traversal which remains single-threaded.

You could try the ignore crate which natively supports parallel traversal.

tavianator avatar Feb 06 '22 22:02 tavianator