itertools
itertools copied to clipboard
Audit for missing `fold` implementations.
Implementing fold for our iterators allows us to provide more performant internal iteration. We provide it in some cases, but not all. I'd like us to take three steps towards resolving this:
- Take an inventory of whether our adapters provide
fold. - Implement
foldfor missing adapters. - (Potentially!) Implement a clippy lint to prevent regressions.
Happy to provide mentorship for any of these items.
Note that the vast majority of the time, you should be overriding fold, not for_each.
The default for_each just calls fold with a trivial () accumulator, which is easily optimized out (since in the default Rust calling convention, it's not even passed).
Thus the only reason to override for_each in addition to fold is if you can optimize something by knowing that the accumulation is stateless, but I can't off-hand recall any case where I've seen that to be the case.
Whoops, yes, of course. I'll modify the issue accordingly.
TODO list to track fold specializations (ordered by filepaths and line numbers)
- [x] CoalesceBy
- [x] MapSpecialCase
- [x] Interleave
- [x] InterleaveShortest
- [x] PutBack
- [x] Product
- [ ] TakeWhileRef: No test/benchmark yet.
- [x] WhileSome
- [x] TupleCombinations + Tuple1Combination + Tuple*Combination (macro)
- [x] FilterOk
- [x] FilterMapOk
- [x] Positions
- [x] Update
- [ ] MultiProduct: Might not be faster because of the heap-allocation bottleneck.
- [ ] Combinations: Might not be faster because of the heap-allocation bottleneck.
- [ ] CombinationsWithReplacement: Might not be faster because of the heap-allocation bottleneck.
- [x] ConsTuples
- [ ] DuplicatesBy #921
- [x] ExactlyOneError
- [x] FlattenOk
- [ ] Groups: No test/benchmark yet.
- [ ] Group: No test/benchmark yet.
- [ ] Chunks: No test/benchmark yet.
- [ ] Chunk: No test/benchmark yet.
- [x] IntersperseWith
- [ ] KMergeBy
- [x] MergeBy
- [x] MultiPeek
- [x] PadUsing
- [ ] PeekingTakeWhile: No test/benchmark yet.
- [x] PeekNth
- [ ] Permutations: Might not be faster because of the heap-allocation bottleneck.
- [x] Powerset
- [x] ProcessResults
- [x] PutBackN
- [ ] RcIter: No test/benchmark yet.
- [x] RepeatN
- [x] TakeWhileInclusive
- [ ] Tee: No test/benchmark yet.
- [ ] TupleBuffer
- [ ] Tuples
- [ ] TupleWindows
- [ ] CircularTupleWindows
- [ ] UniqueBy #918
- [ ] Unique #919
- [x] WithPosition
- [ ] Zip
- [x] ZipLongest
Should be ignored:
And rfold:
- [ ] MapSpecialCase
- [x] Positions
- [ ] Update
- [ ] DuplicatesBy #921
- [x] FlattenOk
- [x] PadUsing
- [ ] RcIter: No test/benchmark yet.
- [x] RepeatN
- [ ] UniqueBy #918
- [ ] Unique #919
- [ ] Zip
- [x] ZipLongest
I think we should mention this issue in related PRs to help maintain this big todo list.
Once we do this, we should test all of them via test_specializations.
I'm interested to contribute to this crate and I digged a bit into the source code. I think one potential candidate to override fold is Combinations.
https://github.com/rust-itertools/itertools/blob/master/src/combinations.rs#L100
What do you guys think ?
I can create a PR in coming few days to implement this.
I'm interested to contribute to this crate and I digged a bit into the source code. I think one potential candidate to override
foldisCombinations.https://github.com/rust-itertools/itertools/blob/master/src/combinations.rs#L100
What do you guys think ?
I can create a PR in coming few days to implement this.
@Philippe-Cholet I think you're the most knowledgeable there. Could you mentor/review this?
I previously had some thoughts about spliting next in two methods to make a nth that would create only one vector instead of n+1 vectors. And I thought earlier it would be better done before fold but no as it needs to create each item anyway. I guess fold would be an interesting and total rewrite of next.
@Owen-CH-Leung Hi, sure you can help. And I'll be there to "mentor/review" it.
@jswrenn @phimuemue I've been editing a lot my comment above, and I would like to have feedback on it.
The list currently consists of all structs implementing Iterator, maybe we should ignore some of them?
Should we consider rfold specializations as well (small list above)?
@jswrenn, I'd love to get started contributing to Rust, but I'd need a little help getting started since this would be my first ever GitHub contribution. You mention you're happy to provide mentorship in your initial post, so I'm eager to get to work if you could help me out with any of the items on the checklist. Thanks!
@DavidCoy77 I guess that specialize PutBackN::fold would be a good first pull request. You could look at #772 to find out what's expected.
Hi @DavidCoy77, apologies for not replying to your emails — I've been traveling. I agree with @Philippe-Cholet that our fold implementations are the best place to start.
Great! Thank you for the help @jswrenn and @Philippe-Cholet. I've spent some time looking at the WithPosition::fold at #772 Philippe mentioned and unfortunately I have a hard time what would be asked of me for PutBackN::fold , so If you could offer additional guidance it would be greatly appreciated.
EDIT: The text below is a bit outdated as benchmarks and specializations are now written for most of our iterators. Write the fold method (and run a benchmark command before/after) is enough to contribute. Example: #820.
@DavidCoy77 I guess there is a lot of new things (I did not know criterion and quickcheck before coming here, and was not that familiar with fold either).
- Are you familiar with
Iterator::fold? With it, we consume the iterator and therefore consume all items. Specialize it can be faster than callingnextrepeatedly as it would be the case without specialization. You might not usefolddirectly but by default there are a few iterator methods that usefoldsuch asfor_eachwhich I believe is more well known/used(?). - Are you familiar with
itertools::PutBackN? I know I was not but I like the doctest example of itsput_backmethod. - The core of the pull request would be to write
PutBackN::fold. - In "benches/bench1.rs", we would write a benchmark function (using criterion) to measure the time improvement of specializing
PutBackN::foldas we don't want to slow things down. Soon enough, this should be automatically provided by a macro I'm writing in another PR. - In "tests/specializations.rs", we write a function inside a "quickcheck" macro (it's property based testing, basically it tests our function against many inputs, and when it fails, it tries to find the smallest input that fails, super useful to find edge cases we did not consider properly). In that file,
test_specializationstakes a clonable iterator and test "fold" specialization amongst other method specializations, which simply allow us to write less code.
You can focus on figuring out PutBackN::fold first.
pub struct PutBackN<I: Iterator> {
top: Vec<I::Item>,
iter: I,
}
// From the next method, we want elements of `top` in the reversed order then, only then, the elements from `iter` (normal order).
fn next(&mut self) -> Option<Self::Item> {
self.top.pop().or_else(|| self.iter.next())
}
EDIT: WithPosition::fold was a huge rewrite of WithPosition::next, not the simplest one. #774 might be a better example and also show we should delegate to fold when possible. I think PutBackN::fold will be an even better example once it's done.
@jswrenn I'm curious but don't understand the exact intention behind this yet:
- (Potentially!) Implement a clippy lint to prevent regressions.
A lint that warn us about missing fold specializations, is that it?!
Yes, the idea is to add lints to clippy itself to warn us about missing fold specializations. For any adaptors where we explicitly didn't want to specialize fold, we would need to write #[allow(clippy::missing_iterator_fold)]. This would allow us to audit for missing clippy lints as simple as running cargo clippy, and to provide an automated warning if we ever tried to add a new adaptor without a fold implementation.
I'm looking to give this issue a try.
Started with Combinations and I wonder how does folding on that one would look like? with LazyBuffer and such.
I can also use some mentoring/walkthrough of the crate if someone is willing to show me around.
@luander I can happily mentor you on this.
- Specializations are tested in "tests/specializations.rs" so test it is a command line away (
cargo test combinationsis fast enough). - Specializations are benchmarked in "benches/specializations.rs" so benchmark it would be a command line away too (
cargo bench --bench specializations combinations/foldto ignore others). But build this particular benchmark is really slow (almost 2 minutes for me) so the simplest thing to do is to temporarily ~~comment out~~ remove unrelated benchmarks (without commiting). - I'm not sure what it would look like exactly. I guess we might want to do some work from
nextwhile the lazy buffer is not full yet then some simpler work once we know for sure it is full. In the process, we might want to expandLazyBufferabilities. SpecializeCombinations::foldwould definitely not be as straightforward as other iterators but it's an interesting one.
EDIT: Alternatively, ZipEq::fold would be simpler.
@jswrenn I'm curious but don't understand the exact intention behind this yet:
- (Potentially!) Implement a clippy lint to prevent regressions.
A lint that warn us about missing
foldspecializations, is that it?!
If clippy does not accept the lint, we could maybe (cleverly ab)use missing_trait_methods.
If clippy does not accept the lint, we could maybe (cleverly ab)use
missing_trait_methods.
I think it'll be accepted (and about size_hint, [try_][r]fold also). But probably not for all iterator methods we could want to specialize like count etc (that might be too much to ask).
I ran cargo clippy -- -Wclippy::missing_trait_methods, got 6752 warnings.
I added the #[warn(...)] to Powerset only and got only 73 warnings.
If clippy could display JSON data instead of pretty text (or hack info from it), maybe we could leverage that though.
EDIT: I might keep my fork and have a branch implementing other lints if keep it updated is simple enough.
Update on the lint idea: After discussion on zulip, missing_trait_methods might accept a configuration (in clippy.toml) instead of a new lint that would be too specific to a single trait method when it could be generalized. It would not be as flexible as we might want but it would be decent.
There are alternatives though: fork clippy (but maintainance?), use clippy sister project marker (great developper experience but it's a work in progress, I can't identify an iterator implementation with it yet), dylint (should be nice enough but my first try was a failure).
If clippy does not accept the lint, we could maybe (cleverly ab)use
missing_trait_methods.I think it'll be accepted (and about
size_hint,[try_][r]foldalso). But probably not for all iterator methods we could want to specialize likecountetc (that might be too much to ask). I rancargo clippy -- -Wclippy::missing_trait_methods, got 6752 warnings. I added the#[warn(...)]toPowersetonly and got only 73 warnings. If clippy could display JSON data instead of pretty text (or hack info from it), maybe we could leverage that though.EDIT: I might keep my fork and have a branch implementing other lints if keep it updated is simple enough.
My poor man's approach to fint missing folds (not thoroughly tested, but one gets the idea):
cargo clippy --message-format json -- --warn clippy::missing_trait_methods | jq .message.rendered | grep "\bfold\b"
@phimuemue Oh... I find embarrassing I was aware about --message-format json for "check" (never used) but not for "clippy".
After some tinkering on jq playground, I think I have a new git alias (it's not git-related but useful anyway):
missing-trait-methods = "!f() { cargo clippy --quiet --message-format json -- --warn clippy::missing_trait_methods | jq -r '. | select(.message.code.code == \"clippy::missing_trait_methods\") | select(.message.message | endswith(\"`'$2'`\")) | .message.spans[0] | .file_name + \":\" + (.line_start | tostring) + \" \" + .text[0].text' | grep $1 | uniq; }; f"
then
git missing-trait-methods Iterator fold
git missing-trait-methods DoubleEndedIterator rfold
Output for `git missing-trait-methods Iterator size_hint`
src\adaptors\mod.rs:498 impl<B, F, I> Iterator for Batching<I, F>
src\groupbylazy.rs:387 impl<'a, K, I, F> Iterator for Groups<'a, K, I, F>
src\groupbylazy.rs:440 impl<'a, K, I, F> Iterator for Group<'a, K, I, F>
src\groupbylazy.rs:556 impl<'a, I> Iterator for Chunks<'a, I>
src\groupbylazy.rs:599 impl<'a, I> Iterator for Chunk<'a, I>
src\grouping_map.rs:25 impl<K, V, I, F> Iterator for MapForGrouping<I, F>
src\sources.rs:127 impl<A, St, F> Iterator for Unfold<St, F>
@luander Can I help you more on Combinations::fold?
Alternatively, I suggested ZipEq::fold but if you want to work on Combinations, specialize nth would be appreciated (it would solve #301 and close the outdated and stuck #329).
@jswrenn A lint missing_iterator_fold was not particularly well received: too specific to iterators basically.
The main alternative: clippy::missing_trait_methods could be configurable to limit warnings (6538 right now):
# clippy.toml
missing_trait_methods = [
"core::iter::Iterator::fold",
"core::iter::DoubleEndedIterator::rfold",
"core::iter::Iterator::size_hint",
]
The downside is that allow the lint on a impl Iterator would allow not only one missing specialization but all of them. But if we limit ourselves to size_hint/fold/rfold then this seems good enough to me.
Is that okay with you? If it is, I'll work on making the lint configurable.
Making clippy::missing_trait_methods configurable sounds fantastic!
@Philippe-Cholet Hi there, on Combinations::fold, once #914 is merged, I think something like this should work as the core of an implementation:
// increment_indices returns true once we run out of combinations
while !self.increment_indices() {
combination = self.indices.iter().map(|i| self.pool[*i].clone()).collect();
acc = f(acc, combination)
}
If you think that looks reasonable I'll put through a PR.
Edit: I tried this out and, although it works, it is markedly slower than the unspecialized implementation
I tried this out and, although it works, it is markedly slower than the unspecialized implementation
@kinto-b We won't specialize it if it's slower (but that alone is a valuable information). About combinations, the slowest thing is heap-allocate the generated items. fold by nature generates them all (to give them to f) so we can't win anything on that side.
The only thing Combinations::fold could do is simpler code after the initialization phase while the default fold repeatedly doing next have "init code" lying around without use after a few steps, basically what you did but even it was a perf win, I did not expect much out of it since the heap-allocation is the bottleneck here and nothing else really impact performance (or at least I think so).
I kinda wonder if mutate a combination IN-PLACE and give clones would be faster but it's a lot of work for a probable small win (if any): while !self.increment_indices_and_comb(&mut comb) { acc = f(acc, comb.clone()); } or something like that.
I will eventually work again on configure clippy::missing_trait_methods and (if accepted) we will be able to mark that some fold are allowed to not be specialized and ensures all other fold are.