itertools
itertools copied to clipboard
Cartesian power.
Hello, this is my first contribution to a rust library, can somebody guide me a little out here?
The idea is to complete the cartesian_product logic with a cartesian_power adaptor. It scrolls the cartesian product of an iterator with itself pow times. For instance with the set abc and pow=4, it yields:
aaaa (0)
aaab (1)
aaac (2)
aaba (3)
aabb (4)
aabc (5)
aaca (6)
aacb (7)
aacc (8)
abaa (9)
...
cccc (80 = 3^4 - 1)
Put it another way, it's supposedly equivalent to iproduct!("abc", "abc", "abc", "abc") except for the technical types quirks.
I have a few unresolved questions, which I need help with:
-
Dummy stuff first: the diff is big because I thought it was okay to hit the
rustfmtbutton, was I wrong? Should I reverse formatting? -
I cannot get the
size_hinttest to pass because it just.. times out, although I'm unsure why. How am I supposed to correctly test it? -
What am I supposed to add in the
benchfiles? -
Deepest questions last: How could the
Self::Itemtype be more generic in this case? I have chosenVecto start with, but the problem is that the expected returned length actually depends on thepowargument.- Should I leave it as-is and wait for generic consts to be available?
- Can I improve it by making the returned collection type (here
Vec) a type parameter? If yes, how?
This is it, thank you for your patience, I'd be happy to read your feedback :)
Some quick answers:
Dummy stuff first: the diff is big because I thought it was okay to hit the rustfmt button, was I wrong? Should I reverse formatting?
Yes, please! The smaller the PR, the better, since I need to review every changed line before merging. We're really not that particular about formatting. Just indent with four spaces, and I'm sure it'll be fine.
What am I supposed to add in the bench files?
You don't need to add anything!
I'll need a little more time before I can answer your other questions.
Hi there. Foremost, thank you for your upfront effort to clarify the modalities.
I hope I do not miss anything, but what is the difference to combinations_with_replacement?
If it is supposed to do the same, you still might look into the existing implementation, and e.g. see if it has a decent size_hint.
@jswrenn Okay, I'll reverse that so it'll be easier for you to read. Don't get tired scrolling meanwhile :)
BTW would you accept a separate PR with only a thorough rustfmt pass to standardize whitespace in the whole project?
@phimuemue The difference is order, combination_with_replacement is insensitive to order, so it only outputs aab, aba OR baa, correct? On the other hand cartesian_power will output aab, aba AND baa.
@phimuemue The difference is order,
combination_with_replacementis insensitive to order, so it only outputsaab,abaORbaa, correct? On the other handcartesian_powerwill outputaab,abaANDbaa.
Thanks - Too early to read code...
@jswrenn I have reversed the rustfmt pass so the diff with master will be easier for you to read :)
How could the Self::Item type be more generic in this case? I have chosen Vec to start with, but the problem is that the expected returned length actually depends on the pow argument.
Here's another way to look at it: the pow argument equals the returned length, and is thus redundant. You could nix the argument and determine the appropriate Item with a type parameter instead.
inhowfar Power is similar to CombinationsWithReplacement. (To be honest, it seems quite similar to me, so I would expect similar inner workings, too.
@phimuemue I think it is similar in its signature, because they both yield iterables whose length is eventually decided by the user. I've tried to inspire from the combination_with_replacement signature.
This said, I doubt it implies that the inner workings should resemble each other. I could think of a very inefficient implementation of cartesian_product chaining combination_with_replacement, permutations then sort, but I'm not sure this helps much, except maybe for explaining what it does?
@phimuemue
Should we go with Vec or with LazyBuffer? Should we have a
Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?
I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any FromIterator<I::Item> type instead of Vec, used as a type argument <C> in next so the procedure would build the return value with let res = state.iter().cloned().collect::<C>(); instead of let res: Vec<_> = state.clone();. But I couldn't get this right.
The other option I like is what @jswrenn suggests:
You could nix the argument and determine the appropriate Item with a type parameter instead.
If you mean that the user would do something like it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?
The other option I like is what @jswrenn suggests:
You could nix the argument and determine the appropriate Item with a type parameter instead.
If you mean that the user would do something like
it.cartesian_product<[char; 4]>(), then I would also find it awesome, but I'm afraid it requires generic consts to be featured first?
If we want the type-based approach, I could imagine it is possible to do it right now - with an upper bound on the array length.
However, the type-based approach would prevent us from specifying pow at runtime. So, I suggest we first make clear if we want to have a dynamic/run-time pow, or if we want to go with a type-based approach. My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.
The following probably only applies if we go with the dynamic/runtime approach:
@phimuemue
Should we go with Vec or with LazyBuffer? Should we have a
Vec<I>, iterating on demand to get the next element, or should we go with the index-based approach like in CombinationsWithReplacement?I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any
FromIterator<I::Item>type instead ofVec, used as a type argument<C>innextso the procedure would build the return value withlet res = state.iter().cloned().collect::<C>();instead oflet res: Vec<_> = state.clone();. But I couldn't get this right.
CombinationsWithReplacement/Combinations does not require that the iterator is Clone, as it only clones the elements - this is in contrast with your proposal, and I would assume that users expect requirements similar to those of CombinationsWithReplacement/Combinations.
Moreover, CombinationsWithReplacement/Combinations does not unconditionally exhaust the iterator (it uses LazyBuffer, only advancing the iterator on demant) - in contrast to your proposal. Again, I would expect cartesian_power to behave similarily.
Please note that the existing implementations may not be better than your suggestion. All I advocate for is uniform behavior. Still, the existing implementation at least avoids cloneing the iterator, and goes with only cloning the elements instead.
@phimuemue Okay, I'll have a deeper look into CombinationWithReplacement internals so I can answer you better.
I agree about the runtime pow vs. compile time pow valuation. These may actually require different implementations, maybe a (procedural?) macro like ipower!(my_it, 8) would do the trick for the compile time use case?
My guess is that if you know the array length at runtime, you could easily go with iproduct or a nested cartesian_product.
I disagree, because iproduct!(my_it, my_it, ..., my_it) yield tuples, so you cannot iterate on them. As a user, since I know that the tuple will be uniform in types, I would rather have an array or an iterable instead, which makes ipower!(my_it, 8) different in signature. This said, I suspect the same argument holds for combination_with_replacement, have you already have it?
Regarding runtime pow valuation, I'm okay to consider fixed-sized arrays up to 32 (or maybe more?). I also agree that uniform behaviour is a desirable thing.
Before I understand CombinationWithReplacement deeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:
- maximize flexibility with respect to the type system? (user is free to pick
Vec,[I::Item; n],HashSet,Stringas yielded items) - maximize execution speed?
- minimize RAM usage?
- minimize heap allocations?
- minimize calls to
I::next()? - minimize calls to
I::Item::clone()? - minimize calls to
I::clone()?
For instance, I can think of an alternate implementation of cartesian_power that does not use I::clone(), but it requires buffering the whole iterator sequence, which may be not desirable if I turns out to be really long.
Before I understand
CombinationWithReplacementdeeply, I would like to ask whether there exists clear-cut implementation recommendations valid throughout the library. Put it simply, what's the policy regarding adaptors implementation tradeoffs:* maximize flexibility with respect to the type system? (user is free to pick `Vec`, `[I::Item; n]`, `HashSet`, `String` as yielded items) * maximize execution speed? * minimize RAM usage? * minimize heap allocations? * minimize calls to `I::next()`? * minimize calls to `I::Item::clone()`? * minimize calls to `I::clone()`?
These are good questions indeed. I do not know if we have specific recommendations. I guess we are usually open to discuss the various tradeoffs - if we do not have any precedence.
If we, however, have precedence, I strongly tend to follow existing patterns. It enforces uniformity, and builds upon possibly good decisions already made by others. If an existing design turns out to be bad, I tend to first fix that, and then build the new thing. In various past situations this technique simplified the code base quite a bit.
In this case, and if we go with the runtime approach, this might even mean looking at CombinationsWithReplacement and Combinations as a first step (to see if they should actually be more similar (see e.g. https://github.com/rust-itertools/itertools/pull/487) and if they can be reused for cartesian_power).
But maybe wait for @jswrenn's opinion on all this.
@jswrenn Friendly ping. I suspect you're maybe busy with safe transmute or other stuff. Feel free to ignore me if it's the case, there's no rush here :)
How would this implementation differ/be better than something like this:
use std::fmt::Debug;
use itertools::Itertools;
fn cartesian_power<const N: usize, T, I>(iterator: I) -> impl Iterator<Item = [T; N]>
where
T: Debug + Clone,
I: Iterator<Item = T> + Clone,
{
(0..N)
.map(|_i| iterator.clone())
.multi_cartesian_product()
.map(|vec| TryInto::<[T; N]>::try_into(vec).unwrap())
}
macro_rules! cartesian_power {
($iterator:expr, $count:expr) => {
$crate::cartesian_power::<$count, _, _>($iterator)
};
}
fn main() {
for [x, y, z] in cartesian_power::<3, _, _>(0..4) {
println!("({x},{y},{z})");
}
for [a, b, c, d, e] in cartesian_power!(0..4, 5) {
println!("({a},{b},{c},{d},{e})");
}
}
@DusterTheFirst I guess it would differ in one or several of the tradeoffs listed here, but IIRC we're still waiting on @jswrenn input here, are we? There was questionning about the (existence of a) general implementation policy in itertools.
Hi @iago-lito , I like the idea of having a cartesian power.
We usually try to minimize heap allocations as it usually is the execution bottleneck. Generate Vec items is fine enough for now. (And I see a time where we could additionally generate arrays not too far in the future).
I tend to think we should not care too much on minimizing cloning iterators and items.
But for this to be included, I think it should be somehow more performant than itertools::repeat_n(iter, pow).multi_cartesian_product() (or the documentation of multi_cartesian_product could just mention this).