itertools
itertools copied to clipboard
Implement `DuplicatesBy::fold`
Relates to #755. I didn't grab the output earlier (is there a way to print without re-benchmarking?) but this makes no difference to the performance. Submitting a PR for record keeping sake
{
"mean": {
"confidence_interval": {
"confidence_level": 0.95,
"lower_bound": -0.0081767782683812,
"upper_bound": 0.0038785180687818605
},
"point_estimate": -0.00312222985963162,
"standard_error": 0.003180882676320771
},
"median": {
"confidence_interval": {
"confidence_level": 0.95,
"lower_bound": -0.009367663685134739,
"upper_bound": -0.003824443197890548
},
"point_estimate": -0.006249808601540896,
"standard_error": 0.0014046028043885176
}
}
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 94.44%. Comparing base (
6814180) to head (73d78c5). Report is 54 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #921 +/- ##
==========================================
+ Coverage 94.38% 94.44% +0.06%
==========================================
Files 48 48
Lines 6665 6882 +217
==========================================
+ Hits 6291 6500 +209
- Misses 374 382 +8
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Same for #918 and #919, it uses a HashMap and therefore heap allocates and (securely but slowly) hash things, those two things are the bottleneck. There is not much to expect of such specialization without improving that front:
- The heap allocation is unavoidable and I don't think it can be optimized (know the iterator size would not even help).
- On the hash front, I've been working in #901 & #906 (on hold right now) in order for our users to be able to use other (faster) maps than
HashMap<_, _, RandomState>.
With a faster hasher, maybe that specialize fold would a bit more beneficial for the benchmark to show some improvement. We might want to revisit those PRs once I make a more general DuplicatesByIn.
I referenced your PRs in https://github.com/rust-itertools/itertools/issues/755#issuecomment-1728319901 to avoid people from making duplicates PRs.
There would also be rfold to consider alongside fold.
Side note:
I'm aware specialization benchmarks are slow to build (2 minutes for me), and 8 seconds per benchmark to run (which is ok).
I usually comment out (or simply remove) benchmarks from the bench_specializations macro declaration (without committing changes) to have fast benchmarks. With that, I don't mind re-run benchmarks.
It's a pain but it's a huge file with a lot of iterators and methods to bench, there is no simple fix to avoid building benchmarks we are not gonna use.