itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Multiunzip supports only up to 12 iterators

Open pablo-slingshot opened this issue 2 years ago • 5 comments

I was trying to multiunzip an iterator of tuples of 37 elements, and I was getting this error:

error[E0277]: the trait bound `std::iter::Map<std::slice::Iter<'_, MyType>, [closure@src/main.rs:193:14: 193:19]>: MultiUnzip<_>` is not satisfied
    --> src/main.rs:234:10
     |
234  |         .multiunzip();
     |          ^^^^^^^^^^ the trait `MultiUnzip<_>` is not implemented for `std::iter::Map<std::slice::Iter<'_, MyType>, [closure@src/main.rs:193:14: 193:19]>`
     |
note: required by a bound in `itertools::Itertools::multiunzip`
    --> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.10.5/src/lib.rs:3656:23
     |
3656 |         Self: Sized + MultiUnzip<FromI>,
     |                       ^^^^^^^^^^^^^^^^^ required by this bound in `itertools::Itertools::multiunzip`

Turns out, it is because multiunzip only supports up to 12 iterators.

Two questions:

  1. Can it support more?
  2. If not, is it possible to make this error more clear?

pablo-slingshot avatar May 01 '23 18:05 pablo-slingshot

Hi there, wow, 37, that's quite a bit.

As far as I know, itertools could theoretically support it (iirc, we generate the impls via a macro, see https://docs.rs/itertools/latest/src/itertools/unziptuple.rs.html#68-80), but I do not know if it affects compile time negatively.

Can you - at least as a first step - fork itertools yourself and copy-paste the macro call for 37 elements? Or can you work around it?

If there's enough demand or compile times do not skyrocket, we could think about offering the impl.

Alternatively: Should/could we offer a simple macro that allows users to implement MultiUnzip for their particular use cases?

As a sidenote: If we touch this, we could possibly simplify the macro a little bit.

phimuemue avatar May 01 '23 18:05 phimuemue

Definitely not a big issue that needs to be solved. I ended up doing something like this:

let my_vec = (
  my_iter.map(|val| val.0).collect_vec(),
  my_iter.map(|val| val.1).collect_vec(),
 // etc
);

Another alternative with less iterations could probably be

let mut my_vec: (Vec<String>, Vec<String>)  = (vec![], vec![]); // etc
for val in my_iter {
  my_vec.0.push(val.0);
  my_vec.1.push(val.1);
}

Anyways, just wanted to made you aware of my use case. Feel free to close this issue if needed.

pablo-slingshot avatar May 01 '23 18:05 pablo-slingshot

I think that sticking with 12-tuples for consistency with core is probably for the best. I'm pretty skeptical about larger heterogeneous types.

Maybe it could be implemented for arrays, though? Getting [Vec<String>; N] from Vec<[String; N]> seems far more reasonable, so long as const generics are within the MSRV...

scottmcm avatar May 01 '23 23:05 scottmcm

Hey, I also triggered this limitation, and also agree that consistency with core (12-tuples) is best. My use case is a (admittedly complex) piece of code that unzips into 13 different types. Implementing for arrays of const size wouldn't solve the issue as I indeed want different types. I know There are a few ways to address this :

  • Feature-gating larger implementations, e.g. Diesel has features such as 32-column-tables, 64-column-tables. This is a good idea because it'll probably solve a good amount of edge cases, but is still a bit fragile because someone will always come with more extreme requirements. So they will eventually resort to solution 2 :
  • The user with extreme requirements forks the crate (and pays the price of maintainance that comes with doing so). It's not ideal, but the lack of variadics brought us here and I don't see it changing in the near future.

One possible way out is providing a derive-macro to be implemented on a user's structure. It is then possible to procgen a method multiunzip on that struct. Rough example :

#[derive(Debug, Multiunzip)]
pub struct Person {
    age: u32,
    name: String,
}

fn main() {
    let persons = vec![
        Person {
            age: 20,
            name: "Alice".to_string(),
        },
        Person {
            age: 25,
            name: "Bob".to_string(),
        },
    ];
    let persons_unzipped = PersonUnzipped::multiunzip(persons.into_iter());
    assert_eq!(persons_unzipped.age, vec![20, 25]);
    assert_eq!(persons_unzipped.name, vec!["Alice".to_string(), "Bob".to_string()]);
}

// Proc-Generated
#[derive(Default)] 
pub struct PersonUnzipped { 
    age: Vec<u32>,
    name: Vec<String>,
}

// Proc-Generated
impl PersonUnzipped {
    // The same implementation as multiunzip
    pub fn multiunzip<IT: Iterator<Item = Person>>(it: IT) -> Self {
        let mut res = PersonUnzipped::default();
        let PersonUnzipped {
            age,
            name,
        } = &mut res;
        it.fold((), |(), v| {
            age.extend(std::iter::once(v.age));
            name.extend(std::iter::once(v.name));
        });
        res
    }
}

I understand that it is unlikely to ever make it in itertools as-is because it is absolutely not the itertools philosophy. That being said, hopefully the above will help people who are blocked by this issue :)

AlisCode avatar May 03 '23 13:05 AlisCode

:wave: I've just released many-unzip, with support for up to 192-tuples (with feature flags to keep compile times at a minimum).

It's a small, specialised crate, so the maintenance for me will likely be tiny. Of course, I still wouldn't be against someone upstreaming the changes into itertools

eopb avatar Aug 29 '24 18:08 eopb