itertools Cartesian power.

Hello, this is my first contribution to a rust library, can somebody guide me a little out here?

The idea is to complete the cartesian_product logic with a cartesian_power adaptor. It scrolls the cartesian product of an iterator with itself pow times. For instance with the set abc and pow=4, it yields:

aaaa (0)
aaab (1)
aaac (2)
aaba (3)
aabb (4)
aabc (5)
aaca (6)
aacb (7)
aacc (8)
abaa (9)
...
cccc (80 = 3^4 - 1)

Put it another way, it's supposedly equivalent to iproduct!("abc", "abc", "abc", "abc") except for the technical types quirks.

I have a few unresolved questions, which I need help with:

Dummy stuff first: the diff is big because I thought it was okay to hit the rustfmt button, was I wrong? Should I reverse formatting?
I cannot get the size_hint test to pass because it just.. times out, although I'm unsure why. How am I supposed to correctly test it?
What am I supposed to add in the bench files?
Deepest questions last: How could the Self::Item type be more generic in this case? I have chosen Vec to start with, but the problem is that the expected returned length actually depends on the pow argument.
- Should I leave it as-is and wait for generic consts to be available?
- Can I improve it by making the returned collection type (here Vec) a type parameter? If yes, how?

This is it, thank you for your patience, I'd be happy to read your feedback :)

Oct 08 '20 21:10 iago-lito

Some quick answers:

Dummy stuff first: the diff is big because I thought it was okay to hit the rustfmt button, was I wrong? Should I reverse formatting?

Yes, please! The smaller the PR, the better, since I need to review every changed line before merging. We're really not that particular about formatting. Just indent with four spaces, and I'm sure it'll be fine.

What am I supposed to add in the bench files?

You don't need to add anything!

I'll need a little more time before I can answer your other questions.

Oct 08 '20 22:10 jswrenn

Hi there. Foremost, thank you for your upfront effort to clarify the modalities.

I hope I do not miss anything, but what is the difference to combinations_with_replacement?

If it is supposed to do the same, you still might look into the existing implementation, and e.g. see if it has a decent size_hint.

Oct 09 '20 07:10 phimuemue

@jswrenn Okay, I'll reverse that so it'll be easier for you to read. Don't get tired scrolling meanwhile :) BTW would you accept a separate PR with only a thorough rustfmt pass to standardize whitespace in the whole project?

@phimuemue The difference is order, combination_with_replacement is insensitive to order, so it only outputs aab, aba OR baa, correct? On the other hand cartesian_power will output aab, aba AND baa.

Oct 09 '20 07:10 iago-lito

@phimuemue The difference is order, combination_with_replacement is insensitive to order, so it only outputs aab, aba OR baa, correct? On the other hand cartesian_power will output aab, aba AND baa.

Thanks - Too early to read code...

Oct 09 '20 07:10 phimuemue

@jswrenn I have reversed the rustfmt pass so the diff with master will be easier for you to read :)

Oct 09 '20 10:10 iago-lito

How could the Self::Item type be more generic in this case? I have chosen Vec to start with, but the problem is that the expected returned length actually depends on the pow argument.

Here's another way to look at it: the pow argument equals the returned length, and is thus redundant. You could nix the argument and determine the appropriate Item with a type parameter instead.

Oct 09 '20 14:10 jswrenn

inhowfar Power is similar to CombinationsWithReplacement. (To be honest, it seems quite similar to me, so I would expect similar inner workings, too.

@phimuemue I think it is similar in its signature, because they both yield iterables whose length is eventually decided by the user. I've tried to inspire from the combination_with_replacement signature.

This said, I doubt it implies that the inner workings should resemble each other. I could think of a very inefficient implementation of cartesian_product chaining combination_with_replacement, permutations then sort, but I'm not sure this helps much, except maybe for explaining what it does?

Oct 09 '20 17:10 iago-lito

@phimuemue

Should we go with Vec or with LazyBuffer? Should we have a Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?

I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any FromIterator<I::Item> type instead of Vec, used as a type argument <C> in next so the procedure would build the return value with let res = state.iter().cloned().collect::<C>(); instead of let res: Vec<_> = state.clone();. But I couldn't get this right.

The other option I like is what @jswrenn suggests:

You could nix the argument and determine the appropriate Item with a type parameter instead.

If you mean that the user would do something like it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?

Oct 09 '20 18:10 iago-lito

The other option I like is what @jswrenn suggests:

You could nix the argument and determine the appropriate Item with a type parameter instead.

If you mean that the user would do something like it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?

If we want the type-based approach, I could imagine it is possible to do it right now - with an upper bound on the array length.

However, the type-based approach would prevent us from specifying pow at runtime. So, I suggest we first make clear if we want to have a dynamic/run-time pow, or if we want to go with a type-based approach. My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.

The following probably only applies if we go with the dynamic/runtime approach:

@phimuemue

Should we go with Vec or with LazyBuffer? Should we have a Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?

I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any FromIterator<I::Item> type instead of Vec, used as a type argument <C> in next so the procedure would build the return value with let res = state.iter().cloned().collect::<C>(); instead of let res: Vec<_> = state.clone();. But I couldn't get this right.

CombinationsWithReplacement/Combinations does not require that the iterator is Clone, as it only clones the elements - this is in contrast with your proposal, and I would assume that users expect requirements similar to those of CombinationsWithReplacement/Combinations.

Moreover, CombinationsWithReplacement/Combinations does not unconditionally exhaust the iterator (it uses LazyBuffer, only advancing the iterator on demant) - in contrast to your proposal. Again, I would expect cartesian_power to behave similarily.

Please note that the existing implementations may not be better than your suggestion. All I advocate for is uniform behavior. Still, the existing implementation at least avoids cloneing the iterator, and goes with only cloning the elements instead.

Oct 10 '20 10:10 phimuemue

@phimuemue Okay, I'll have a deeper look into CombinationWithReplacement internals so I can answer you better.

I agree about the runtime pow vs. compile time pow valuation. These may actually require different implementations, maybe a (procedural?) macro like ipower!(my_it, 8) would do the trick for the compile time use case?

My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.

I disagree, because iproduct!(my_it, my_it, ..., my_it) yield tuples, so you cannot iterate on them. As a user, since I know that the tuple will be uniform in types, I would rather have an array or an iterable instead, which makes ipower!(my_it, 8) different in signature. This said, I suspect the same argument holds for combination_with_replacement, have you already have it?

Regarding runtime pow valuation, I'm okay to consider fixed-sized arrays up to 32 (or maybe more?). I also agree that uniform behaviour is a desirable thing.

Before I understand CombinationWithReplacement deeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:

maximize flexibility with respect to the type system? (user is free to pick Vec, [I::Item; n], HashSet, String as yielded items)
maximize execution speed?
minimize RAM usage?
minimize heap allocations?
minimize calls to I::next()?
minimize calls to I::Item::clone()?
minimize calls to I::clone()?

For instance, I can think of an alternate implementation of cartesian_power that does not use I::clone(), but it requires buffering the whole iterator sequence, which may be not desirable if I turns out to be really long.

Oct 10 '20 10:10 iago-lito

Before I understand CombinationWithReplacement deeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:
* maximize flexibility with respect to the type system? (user is free to pick `Vec`, `[I::Item; n]`, `HashSet`, `String` as yielded items)
* maximize execution speed?
* minimize RAM usage?
* minimize heap allocations?
* minimize calls to `I::next()`?
* minimize calls to `I::Item::clone()`?
* minimize calls to `I::clone()`?

These are good questions indeed. I do not know if we have specific recommendations. I guess we are usually open to discuss the various tradeoffs - if we do not have any precedence.

If we, however, have precedence, I strongly tend to follow existing patterns. It enforces uniformity, and builds upon possibly good decisions already made by others. If an existing design turns out to be bad, I tend to first fix that, and then build the new thing. In various past situations this technique simplified the code base quite a bit.

In this case, and if we go with the runtime approach, this might even mean looking at CombinationsWithReplacement and Combinations as a first step (to see if they should actually be more similar (see e.g. https://github.com/rust-itertools/itertools/pull/487) and if they can be reused for cartesian_power).

But maybe wait for @jswrenn's opinion on all this.

Oct 10 '20 11:10 phimuemue

@jswrenn Friendly ping. I suspect you're maybe busy with safe transmute or other stuff. Feel free to ignore me if it's the case, there's no rush here :)

Dec 04 '20 09:12 iago-lito

How would this implementation differ/be better than something like this:

use std::fmt::Debug;

use itertools::Itertools;

fn cartesian_power<const N: usize, T, I>(iterator: I) -> impl Iterator<Item = [T; N]>
where
    T: Debug + Clone,
    I: Iterator<Item = T> + Clone,
{
    (0..N)
        .map(|_i| iterator.clone())
        .multi_cartesian_product()
        .map(|vec| TryInto::<[T; N]>::try_into(vec).unwrap())
}

macro_rules! cartesian_power {
    ($iterator:expr, $count:expr) => {
        $crate::cartesian_power::<$count, _, _>($iterator)
    };
}

fn main() {
    for [x, y, z] in cartesian_power::<3, _, _>(0..4) {
        println!("({x},{y},{z})");
    }

    for [a, b, c, d, e] in cartesian_power!(0..4, 5) {
        println!("({a},{b},{c},{d},{e})");
    }
}

Mar 04 '22 13:03 DusterTheFirst

@DusterTheFirst I guess it would differ in one or several of the tradeoffs listed here, but IIRC we're still waiting on @jswrenn input here, are we? There was questionning about the (existence of a) general implementation policy in itertools.

Mar 13 '22 10:03 iago-lito

Hi @iago-lito , I like the idea of having a cartesian power. We usually try to minimize heap allocations as it usually is the execution bottleneck. Generate Vec items is fine enough for now. (And I see a time where we could additionally generate arrays not too far in the future). I tend to think we should not care too much on minimizing cloning iterators and items.

But for this to be included, I think it should be somehow more performant than itertools::repeat_n(iter, pow).multi_cartesian_product() (or the documentation of multi_cartesian_product could just mention this).

May 15 '24 09:05 Philippe-Cholet

itertools itertools copied to clipboard

Cartesian power.

itertools
itertools copied to clipboard