itertools Laziness of our iterators

Laziness of our iterators

Open Philippe-Cholet opened this issue 2 years ago • 1 comments

Similar to #601 but for all our iterators and iterator adaptors. Laziness reference: https://doc.rust-lang.org/std/iter/index.html#laziness

And I think each iterator struct should have the related must_use attribute:

Adaptors: #[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
Non-adaptors: #[must_use = "iterators are lazy and do nothing unless consumed"]

Load some items might be unavoidable in some case(?) but surely not all. TODO:

[x] #794
[x] #793 permutations(n) consumes n items at definition (so not lazy when n >= 1).
[x] #795 combinations(n) consumes n items at definition (so not lazy when n >= 1).
[x] #797 intersperse[_with] both consume 1 item at definition.
[x] #800 a.cartesian_product(b) consumes 1 item of a at definition. Therefore iproduct!(i_1, i_2, i_n) consumes 1 item of the first n - 1 iterators at definition.
[x] #801 coalesce, dedup[_by][_with_count] all consume 1 item at definition.
[x] #792
[x] #767 mention laziness for new iterator adaptors and where to test it.
[ ] tuple_combinations::<T> consume 1 item per nested iterator at definition ("T::len()" - 1 items so not lazy for (_, _, ...)).
[ ] kmerge[_by] collect all items and consume 1 item of each iterator at definition.

Nov 03 '23 09:11 Philippe-Cholet

About TupleCombinations and KMergeBy, the only way I see to make them lazy is to delay the initialization step with a LazyInit type (for e.g. based on Into conversion which is already done for the TupleCombinations part).

// `Option` (or a third variant) is a necessary implementation detail to take ownership of `U`.
// `Uninit(None)` (or the 3rd variant) would be unreachable.
pub(crate) enum LazyInit<U, I> {
    Uninit(Option<U>),
    Init(I),
}
impl<U, I> LazyInit<U, I> {                               // useful later for
    pub fn new(uninit: U) -> Self;                        // at definition
    pub fn get(&self) -> Result<&I, &U>;                  // size_hint
    pub fn get_mut(&mut self) -> &mut I where U: Into<I>; // next, ...
    pub fn take_init(self) -> I where U: Into<I>;         // fold
    // eventually
    pub fn take_uninit(self) -> Result<U, I>;             // count
}

The most important method is get_mut which handle initialization once and simply return the mutable reference of I otherwise.

I worry it might result in a slower iterator.

@jswrenn @phimuemue What do you think about such pattern/type?

Nov 22 '23 09:11 Philippe-Cholet

itertools itertools copied to clipboard

Laziness of our iterators

itertools
itertools copied to clipboard