findutils
findutils copied to clipboard
Leverage parallel capabilities
not sure if this does what it should be doing but tests are passing
fixes #112
Sweet Could you please add a BENCHMARKING.md file like https://github.com/uutils/coreutils/blob/main/src/uu/sort/BENCHMARKING.md with some examples?
Did you see some performance wins?
Codecov Report
Merging #142 (836d06c) into main (58070d6) will increase coverage by
0.13%. The diff coverage is100.00%.
@@ Coverage Diff @@
## main #142 +/- ##
==========================================
+ Coverage 52.98% 53.12% +0.13%
==========================================
Files 24 24
Lines 4756 4757 +1
Branches 1548 1553 +5
==========================================
+ Hits 2520 2527 +7
+ Misses 1687 1675 -12
- Partials 549 555 +6
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/find/matchers/mod.rs | 62.54% <ø> (ø) |
|
| src/find/mod.rs | 63.33% <100.00%> (+1.53%) |
:arrow_up: |
| src/xargs/mod.rs | 77.46% <0.00%> (-2.08%) |
:arrow_down: |
| src/lib.rs | 36.36% <0.00%> (+0.49%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 58070d6...836d06c. Read the comment docs.
Seems that it broke some BFS tests:
2022-02-04T08:16:27.7485162Z /home/runner/work/findutils/findutils/findutils/target/release/find basic \( -name foo -type d -o -name bar -a -type f \) -print \, \! -empty -type f -print
2022-02-04T08:16:27.7548222Z test_print0
2022-02-04T08:16:27.7552535Z /home/runner/work/findutils/findutils/findutils/target/release/find basic/a basic/b -print0
2022-02-04T08:16:27.7567667Z thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', src/find/matchers/printer.rs:42:52
2022-02-04T08:16:27.7568120Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2022-02-04T08:16:27.7585129Z test_print0 failed!
i ran some benchmarks and this branch is slightly slower then main and both are faster then gnu find
$ hyperfine --warmup 3 'find ~ -name yarn.lock' './find ~ -name yarn.lock' './find-par ~ -name yarn.lock'
Benchmark 1: find ~ -name yarn.lock
Time (mean ± σ): 798.2 ms ± 9.0 ms [User: 332.7 ms, System: 465.3 ms]
Range (min … max): 783.5 ms … 809.0 ms 10 runs
Benchmark 2: ./find ~ -name yarn.lock
Time (mean ± σ): 608.6 ms ± 11.1 ms [User: 236.1 ms, System: 372.5 ms]
Range (min … max): 590.4 ms … 621.0 ms 10 runs
Benchmark 3: ./find-par ~ -name yarn.lock
Time (mean ± σ): 618.9 ms ± 7.5 ms [User: 224.5 ms, System: 395.0 ms]
Range (min … max): 606.6 ms … 629.4 ms 10 runs
Summary
'./find ~ -name yarn.lock' ran
1.02 ± 0.02 times faster than './find-par ~ -name yarn.lock'
1.31 ± 0.03 times faster than 'find ~ -name yarn.lock'
$ hyperfine --warmup 3 'find ~ -name *.json' './find ~ -name *.json' './find-par ~ -name *.json'
Benchmark 1: find ~ -name *.json
Time (mean ± σ): 836.2 ms ± 6.4 ms [User: 374.1 ms, System: 461.8 ms]
Range (min … max): 826.4 ms … 847.3 ms 10 runs
Benchmark 2: ./find ~ -name *.json
Time (mean ± σ): 723.5 ms ± 9.4 ms [User: 313.4 ms, System: 410.1 ms]
Range (min … max): 710.3 ms … 735.8 ms 10 runs
Benchmark 3: ./find-par ~ -name *.json
Time (mean ± σ): 741.5 ms ± 5.4 ms [User: 339.3 ms, System: 402.8 ms]
Range (min … max): 735.1 ms … 752.0 ms 10 runs
Summary
'./find ~ -name *.json' ran
1.02 ± 0.02 times faster than './find-par ~ -name *.json'
1.16 ± 0.02 times faster than 'find ~ -name *.json'
./findis build from main branch./find-paris build from this branch- and
findis gnu find
slower, even on directories with plenty of files?
730550 files to be exact
Spinning rust, or SSD. I could see how on the former, parallel operations could end up just fighting over head seeks.
Mark
On Fri, Feb 4, 2022 at 2:24 PM Tsotne Nazarashvili @.***> wrote:
730550 files to be exact
— Reply to this email directly, view it on GitHub https://github.com/uutils/findutils/pull/142#issuecomment-1030035001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3MQF7GOHRUG5WPX76ABZ3UZPOQZANCNFSM5NRBOYTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
i am testing on SSD
ok, so, is there an advantage merging this ? (besides the fun of it ? :)
no. in current state this branch is slightly more complex and slightly less efficient version so there is no point in merging it.
if you can give any advice on possible problems/solutions and/or what to investigate etc. to actually improve performance using parallelism i can try to do it. if not we can close this PR i guess
I believe there's no performance benefit because with walkdir, all the filesystem traversal happens on a single thread. par_iter() then splits up the processing work between multiple threads, but the bottleneck is the traversal which remains single-threaded.
You could try the ignore crate which natively supports parallel traversal.