itertools icon indicating copy to clipboard operation
itertools copied to clipboard

New iterize() function?

Open NicholasSterling opened this issue 5 years ago • 5 comments

Hi folks. Is the iterize() function demonstrated below of any interest? It's rather like itertools::iterate(), making it easy to create a certain class of iterators via something like induction. But for the cases it covers (the state must be Copy, and should be small), it offers 1) a simplified API and 2) improved performance.

UPDATE: I take that back. In some cases the performance is better, but in other cases it is worse.

For example, both of these produce the Fibonacci sequence:

iterate((1,1), |&(a,b)| (b, a+b)).map(|p| p.0)
iterize((1,1), |(a,b)| (b, a+b))

You specify an initial state and a rule for converting the Nth state into the (N+1)th state. At least for the Fibonacci sequence iterize seems to have a slight edge on all of the other implementations (including itertools::iterate) offered in this forum thread (although those benchmarks do more than just iterate):

https://users.rust-lang.org/t/any-possible-improvements-to-this-functional-program/39085/61

I haven't contributed before, and am still somewhat new to Rust, but wondered whether this might be of interest for itertools. Here's the demo:

fn main() {
    dbg!(fib().take(6).collect::<Vec<i32>>());
}

// Returns an Iterator for the Fibonacci sequence: 1 1 2 3 5 8 ...
fn fib() -> impl Iterator<Item = i32> {
    iterize((1,1), |(a,b)| (b, a+b))
}

// Produces an Iterator by induction.
// Given an initial state of type (R,S) and a function that produces
// the next state from an existing state, we return an Iterator for the Rs.
// So in (R,S), R is the part that gets (R)eturned by the Iterator,
// and S is any additional (S)tate used internally.
fn iterize<R: Copy, S: Copy>(s0: (R,S), f: impl Fn((R,S)) -> (R,S)) -> impl Iterator<Item = R> {
    let mut state = s0;
    std::iter::repeat_with(
        move || { state.swap(f(state)).0 }
    )
}

// a.swap(b) sets a to b and returns the old value of a.
// It is better than using std::mem::replace() directly in many cases
// because 1) you don't have to explicitly say &mut, and 2) it saves
// you having to make a temporary variable when the new value
// is computed from the old value (because of two-phase mutable borrows).
pub trait Swap: Sized {
    fn swap(&mut self, value: Self) -> Self;
}
impl<T> Swap for T {
    fn swap(&mut self, new: Self) -> Self {
        std::mem::replace(self, new)
    }
}

The benchmarks for the implementations in that forum thread are here: https://github.com/NicholasSterling/rust_functional_fibo See especially the "iterators" branch.

NicholasSterling avatar Mar 18 '20 19:03 NicholasSterling

It just occurred to me (duh) that you could make this work like iterize((1,1), |a,b| (b, a+b)) rather than iterize((1,1), |(a,b)| (b, a+b)) although I haven't tried it yet to see whether that will impact performance.

NicholasSterling avatar Mar 17 '21 21:03 NicholasSterling

The obvious question to me is whether it's needed vs the other things that currently exist, like

  • https://docs.rs/itertools/0.10.0/itertools/fn.unfold.html
  • https://doc.rust-lang.org/nightly/std/iter/fn.from_fn.html

How hard is the example to encode in one of those instead?

scottmcm avatar Mar 17 '21 23:03 scottmcm

Here's a four-way showdown between some of the current options (playground):

pub fn with_unfold() -> impl Iterator<Item=u32> {
    use itertools::unfold;

    unfold((0, 1), |(current, next)| {
        *next += *current;
        *current = *next - *current;
        Some(*current)
    })
}

pub fn with_from_fn() -> impl Iterator<Item=u32> {
    use core::iter::from_fn;

    let mut current = 0;
    let mut next = 1;

    from_fn(move || {
        next += current;
        current = next - current;
        Some(current)
    })
}

pub fn with_iterate() -> impl Iterator<Item=u32> {
    use itertools::iterate;

    iterate((0, 1), |(prev, current)| {
        (*current, current + prev)
    }).map(|(_, current)| current)
}

pub fn with_successors() -> impl Iterator<Item=u32> {
    use core::iter::successors;

    successors(Some((0, 1)), |&(prev, current)| {
        Some((current, current + prev))
    }).map(|(_, current)| current)
}


fn main() {
    use itertools::Itertools;
    println!("{:?}", with_unfold().take(10).format(","));
    println!("{:?}", with_from_fn().take(10).format(","));
    println!("{:?}", with_iterate().take(10).format(","));
    println!("{:?}", with_successors().take(10).format(","));
}

They all feel similar enough to me that I'm having a hard time meaningfully contrasting them.

jswrenn avatar Mar 18 '21 18:03 jswrenn

Another thought: I might like this better if it had a name more evocative of how it turns the closure into an iterator.

Like I'm a fan of unfold because I understand how fold works, and can use that to understand unfold.

scottmcm avatar Mar 18 '21 22:03 scottmcm

I agree that it isn't really that hard to do it with any of a number of existing APIs, as @jswrenn so effectively demonstrated! Of those solutions, the iterate() version seems the most straightforward. And while there might be performance differences, they probably aren't huge. Perhaps because I was (still am, really) getting to know Rust, I was thinking of it from the point of view of a Javascript or Python programmer checking out Rust, a language they had heard was hard. Perhaps for them the difference between these would matter:

iterate((1,1), |&(a,b)| (b, a+b)).map(|(a,b)| a)
induce((1,1), |a,b| (b, a+b))

But I agree, for anyone familiar with the language the difference is not so large. Unless this would fit a lot of simple cases and perform well, perhaps it's just not worth doing. Thanks for taking the time to consider it.

NicholasSterling avatar Mar 19 '21 02:03 NicholasSterling

No big performance gain expected and there are multiple alternatives. Therefore, I'm closing this.

Philippe-Cholet avatar Oct 07 '23 13:10 Philippe-Cholet