itertools Add multizip_fallback method

Implements #242 as a new function called multizip_fallback which takes a tuple of tuples as an argument where the inner tuples are the subiterators and their respective fallback values. Most of the code is the same as for multizip so nothing too weird going on.

Example usage:

use itertools::multizip_fallback;
fn main {
    let a = &[2, 8, 5, 7];
    let b = 5..7;
    let c = vec![10, 2, 20];
    let mut multiples = Vec::new();
    for (first, second, third) in multizip_fallback(((a, &3), (b, 2), (c, 1))) {
        multiples.push(*first * second * third);
    }
    assert_eq!(multiples, vec![2*5*10, 8*6*2, 5*2*20, 7*2*1]);
}

Dec 18 '17 17:12 Pazzaz

Have you seen the difference between izip!() and multizip? The former has better performance (unfortunately).

We should recommend izip and not multizip for that reason. And we should consider if this functionality is better built in the style of izip and not like multizip. Consider means that I don't know if that is possible or is better in this case, but it is a stone that should be turned.

I'm sorry that Zip is straying from the naming scheme, but the new iterator here should not. So the struct that is currently ZipAll should be named just like the function or method that creates it.

Jan 01 '18 15:01 bluss

Have you seen the difference between izip!()? and multizip?

Yes, I considered making multizip_fallback in the style of izip!. The problem with mimicking izip! is that the regular zip() ends when either of the iterators return None. You would have to extend the iterators with the fallback value, zip them together and then use take_while() to detect when all of them have reached their fallback value. To detect the fallback values, the individual iterators would have to return enums with either a retrieved value or a fallback value. I tried to implement this but stringing all of it together actually resulted in worse performance than with the multizip_fallback I ended up with. But I agree that it's something worth considering and there may exist a solution I haven't thought of.

We should recommend izip and not multizip

You're completely right, my mistake.

I'm sorry that Zip is straying from the naming scheme, but the new iterator here should not.

Good point. So, if the name multizip_fallback is good, I'm guessing the struct should be called MultiZipFallback?

Jan 12 '18 19:01 Pazzaz

Yes, I think MultizipFallback would be good, to match the function.

Jan 12 '18 20:01 bluss

I kind of have to try to advocate existing solutions however. For example, anything like izip!(i.chain(repeat(a)), j.chain(repeat(b))) is already similar.

Jan 12 '18 20:01 bluss

@bluss what would izip!(i.chain(repeat(a)), j.chain(repeat(b))) do and is there a cookbook or similar?

Jan 17 '18 12:01 triptec

@triptec Good question, it makes it clear that it is not a good replacement for multizip fallback, since the chain thing does not stop with the longest of the original iterators.

Jan 20 '18 13:01 bluss

I've been trying to use the existing solutions but I haven't found anything that works good enough. I'll try to explain what my thought process has been and what I've come up with so far. Lets say for example we have some code like this:

// Iterators
let i = 0..10;
let j = 10..13;
let k = 14..18;

// Fallback values
let a = 20;
let b = 30;
let c = 40;

for (x, y, z) in izip!(i.chain(repeat(a)), j.chain(repeat(b)), k.chain(repeat(c)))
{
    println!("{}{}{}", x, y, z);;
}

where after izip! has expanded, the for loop will become:

for (x, y, z) in i.chain(repeat(a))
    .zip(j.chain(repeat(b)))
    .zip(k.chain(repeat(c)))
    .map(|((f, g), h)| (f, g, h))
{
    println!("{}{}{}", x, y, z);;
}

This will extend the shorter iterators with the fallback values just like we wanted. Unfortunately, this will also create an infinite for loop as it won't end when all iterators have reached their fallback value.

To do that we'd need to differentiate the values received from the original iterators and their respective fallback values. One way to do it would be to use tuples:

for (x, y, z) in i.map(|x| (x, true)).chain(repeat((a, false)))
    .zip(j.map(|x| (x, true)).chain(repeat((b, false))))
    .zip(k.map(|x| (x, true)).chain(repeat((c, false))))
    .map(|(((f, _), (g, _)), (h, _))| (f, g, h))
{
    println!("{}{}{}", x, y, z);;
}

where the values given from the original iterators are paired with true and fallback values are paired with false. Then you just have to use take_while to end the for loop when all fallback values have been reached

for (x, y, z) in i.map(|x| (x, true)).chain(repeat((a, false)))
    .zip(j.map(|x| (x, true)).chain(repeat((b, false))))
    .zip(k.map(|x| (x, true)).chain(repeat((c, false))))
    .take_while(|x| match x {
        &(((_, false), (_, false)), (_, false)) => false,
        _ => true,
    })
    .map(|(((f, _), (g, _)), (h, _))| (f, g, h))
{
    println!("{}{}{}", x, y, z);;
}

This looks like a good solution right? A little messy but that wouldn't be a problem if this was something a macro expanded into. The problem is the performance. When comparing the performance of izip!, multizip, multizip_fallback and this new repeat-chain-tuple version, there's a clear difference. The benchmark I used is here and the results on my machine were:

running 4 tests
test izip_macro                 ... bench:       9,478 ns/iter (+/- 126)
test multizip_fallback_function ... bench:      20,603 ns/iter (+/- 69)
test multizip_function          ... bench:       9,489 ns/iter (+/- 10)
test repeat_chain_tuple         ... bench:      32,320 ns/iter (+/- 137)

As you can see, using chain, repeat, map and take_while hurts performance quite noticeably. That multizip_fallback is slower than multizip is to be expected. It would be nice if it wasn't that much slower but I haven't found a better way to have the functionality which multizip_fallback provides.

Jan 20 '18 13:01 Pazzaz

I think it seems worthwhile to have this. I haven't come up with any way to do this in izip!() style either..

Feb 03 '18 12:02 bluss

itertools itertools copied to clipboard

Add multizip_fallback method

itertools
itertools copied to clipboard