itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Iterators generating (many) vectors (v2)

Open Philippe-Cholet opened this issue 1 year ago • 0 comments

My #722 was rejected because it was not truly user-convenient nor truly performant but only in the middle.

Observation

But it can be more performant in some cases without sacrificing user convenience as I realized (while reviewing #914) that some iterator methods do not really need to give away all generated vectors:

  • We can do it with ONE vector item max: count, last, nth, find, (if double-ended: nth_back, rfind). And cmp, partial_cmp ~on which lt, le, gt, ge all rely~, eq ~on which ne rely~ but I never saw the point of those methods and never used them.
  • We can do it with TWO vector items max (one for the result, one for the iteration): max_by ~on which max rely~, max_by_key, min_by ~on which min rely~, min_by_key.

Nightly (just to have an idea of what might come): one item (advance[_back]_by, try_find), two items (is_sorted, is_sorted_by).

(Except nth[_back],) all of those usually rely on fold/try_fold which requires to own every item.

Implementation idea

But we could implement a private lightweight try_fold that only take references to an internal item that we would carefully update step-by-step. Then we can easily enough specialize multiple iterator methods, like find that probably has the wider usage (e.g. used by .filter(..)).

After some experiments...

/*private*/ trait LightweightIterator: Iterator + Sized {
    // I'm not sure we need `B`: a lightweight `try_for_each` might be enough. But pass an argument `()` is free!
    fn lw_try_fold<B, F, E>(&mut self, init: B, f: F) -> LoopExit<Self::Item, B, E>
    // ^^^               ^                               ^^^^^^^^ ^^^^^^^^^^^^
    where F: FnMut(B, &Self::Item) -> Result<B, E>;
    //                ^               ^^^^^^^^^^^^

    // Then it would provide "lw_" versions of:
    // count last nth find (cmp partial_cmp eq?) max_by max_by_key min_by min_by_key!
}

// `try_fold` basically returns `Result<B, E>`. With an additional internal item `T`:
enum LoopExit<T, B, E> {
    /// Iteration has ended.
    End {
        last_item: Option<T>,
        accum: B,
    },
    /// The function failed.
    Early {
        failed_item: T,
        err: E,
    },
}

Our iterators returning vectors are Combinations, Powerset, CombinationsWithReplacement, Permutations and MultiProduct.

  • count is already implemented for them.
  • They are not double-ended so we hopefully don't need a DoubleEndedLightweightIterator for nth_back and rfind.

Usage example

impl<I> LightweightIterator for Combinations<I> where ... {
    fn lw_try_fold<B, F, E>(&mut self, init: B, f: F) -> LoopExit<Self::Item, B, E>
    where F: FnMut(B, &Self::Item) -> Result<B, E> { ... }
}

impl<I> Iterator for Combinations<I> where ... {
    fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
    where
        P: FnMut(&Self::Item) -> bool,
    {
        self.lw_find(predicate) // easy
    }
    // then at least... (max|min)_by[_key]
}

Plan

  • [ ] Specialize Combinations::{find, (max|min)_by[_key]}
    • Update specialization tests to consider more methods.
    • Update specialization benchmarks to consider more methods.
    • Define a private trait LightweightIterator.
    • Update CONTRIBUTING.md: Implement LightweightIterator for iterators generating vectors.
    • Implement LightweightIterator for Combinations and specialize some iterator methods.
    • Enjoy benchmarks
  • [ ] Implement LightweightIterator for Powerset and specialize some iterator methods.
  • [ ] Implement LightweightIterator for CombinationsWithReplacement and specialize some iterator methods.
  • [ ] Implement LightweightIterator for Permutations and specialize some iterator methods.
  • [ ] Implement LightweightIterator for MultiProduct and specialize some iterator methods.

Philippe-Cholet avatar Apr 24 '24 09:04 Philippe-Cholet