itertools
itertools copied to clipboard
Add `next_array` and `collect_array`
With this pull request I add two new functions to the Itertools trait:
fn next_array<T, const N: usize>(&mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;
fn collect_array<T, const N: usize>(mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;
These behave exactly like next_tuple and collect_tuple, however they return arrays instead. Since these functions require min_const_generics, I added a tiny build script that checks if Rust's version is 1.51 or higher, and if yes to set the has_min_const_generics config variable. This means that Itertools does not suddenly require 1.51 or higher, only these two functions do.
In order to facilitate this I did have to bump the minimum required Rust version to 1.34 from the (documented) 1.32, since Rust 1.32 and 1.33 have trouble parsing the file even if stuff is conditionally compiled. However, this should not result in any (new) breakage, because Itertools actually already requires Rust 1.34 for 9+ months, since https://github.com/rust-itertools/itertools/commit/83c0f046c077a71185042e020a04262f388a5157 uses saturating_pow which wasn't stabilized until 1.34.
As for rationale, I think these functions are useful, especially for pattern matching and parsing. I don't think there's a high probability they get added to the standard library either, so that's why I directly make a pull request here. When/if TryFromIterator stabilizes we can simplify the implementation, but even then I believe these functions remain a good addition similarly how collect_vec is nice to have despite .collect::<Vec<_>> existing.
A possible enhancement might be to return Option<A> where A: FromArray<Self::Item, N> instead, and adding the FromArray trait, something similar to this:
trait FromArray<T, const N: usize> {
fn from_array(array: [T; N]) -> Self;
}
impl<T, const N: usize> FromArray<T, N> for [T; N] { /* .. */ }
impl<T, const N: usize> FromArray<Option<T>, N> for Option<[T; N]> { /* .. */ }
impl<T, E, const N: usize> FromArray<Result<T, E>, N> for Result<[T; N], E> { /* .. */ }
In fact, I think this is highly useful because it allows things like
let ints = line.split_whitespace().map(|n| n.parse());
if let Ok([x, y, z]) = ints.collect_array() {
...
}
This would be completely in line with FromIterator.
So I have a working implementation of the above idea here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9dba690b0dfc362971635e21647a4c19.
It makes this compile:
fn main() {
let line = "32 -12 24";
let nums = line.split_whitespace().map(|n| n.parse::<i32>());
if let Some(Ok([x, y, z])) = nums.collect_array() {
println!("x: {} y: {} z: {}", x, y, z);
}
}
It would change the interface to:
trait ArrayCollectible<T>: Sized {
fn array_from_iter<I: IntoIterator<Item = T>>(iterable: I) -> Option<Self>;
}
trait Itertools: Iterator {
fn collect_array<A>(self) -> Option<A>
where
Self: Sized,
A: ArrayCollectible<Self::Item>;
}
where
ArrayCollectible<T>is implemented for[T; N];ArrayCollectible<Option<T>>is implemented forOption<[T; N]>;ArrayCollectible<Result<T, E>>is implemented forResult<[T; N], E>.
@phimuemue Any update on this?
@phimuemue Any update on this?
I appreciate your effort, but unfortunately nothing substantial from my side: I changed my mind regarding version-check (so we could use it as a dev-dependency), but I do not have enough time right now to review and merge PRs that ample.
@phimuemue Just for posterity's sake, version-check would be a build-dependency, not dev-dependency.
@phimuemue Just checking in what the status is, I feel very strongly about the usefulness of collect_array. I miss it very often in itertools.
Note that if you want collect_array, you can use https://lib.rs/crates/arrayvec, as the usual way to collect into an array.
I'll also mention Iterator::next_chunk (https://github.com/rust-lang/rust/issues/98326) as a nightly API that'll be next_array.
This is a very useful feature. Today there was a thread on Reddit where the author basically asks if there's a crate that provides collect_array(). IMO, itertools should be the crate to do it
@Expurple
I sure would like to do use const generics and collect_array is one of them.
Our MSRV is quite old (1.43.1 currently) while min-const-generics is 1.51 but I do not think it's the main blocker.
The fact is that there is not much available in recent stable Rust yet which is sad. Iterator::next_chunk and core::array::try_from_fn would be nice to have.
Plus, we currently don't really use unsafe ourselves (only in EitherOrBoth::insert* with obvious unfaillable patterns). I guess we prefer that the std does the heavy work.
I sometimes think about adding arrayvec as a dependency - and falling back to std as soon it's possible. I think it might also solve some other issues (e.g. ExactlyOneError having a manual two-element-arrayvec). Would require Rust 1.51.
Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.
I personally would lean towards arrayvec. @jswrenn @Philippe-Cholet Opinions?
@phimuemue
Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.
ArrayVec<T, CAP> implements Deref<Target = [T]> so (nightly-available) slice methods are directly accessible, that seems to be it.
I sometimes think about adding
arrayvecas a dependency - and falling back tostdas soon it's possible. I think it might also solve some other issues (e.g.ExactlyOneErrorhaving a manual two-element-arrayvec). Would require Rust 1.51.
I'm definitely not opposed to the idea but the ExactlyOneError use case is quite small.
I did not give thoughts before, do you have other examples in mind? (with private usage, in order to fall back to std ASAP).
EDIT: ArrayVec has a maximal capacity of u32::MAX, could it be an issue?
EDIT: Well I have some. With tail and k_smallest (and its variants), I had thoughts of extending them to const where I dreamt of unstable Iterator::next_chunk but I guess we could use arrayvec in the meantime.
(My idea would be that .k_smallest(50) could also support .k_smallest(Const/*::<50> if not inferred elsewhere*/) so that we don't multiply method names too much but merely add a new zero-sized type struct Const<const N: usize>; at places we only gave usize before. Then no allocation.
It's not a magic bullet for every usage though but I see a real usage for it, such as .combinations(Const): internal Vec buffer but would return arrays, so no repetitive slow allocations.)
@scottmcm Small discussion about temporarily adding arrayvec as dependency once we move to const-generics. I just saw a comment of yours related to this. Could you elaborate?
For collect_array, I think I'd prefer just taking the time myself to write the unsafe code. We can vendor the not-yet-stabilized helper functions from the standard library that we'll need.
I can allocate some time to this next week.
@jswrenn Please don't forget that we are discussing this on a PR that already has a working implementation without adding dependencies...
@orlp, thanks, I had forgotten that this was a PR and not an issue when I made my reply. Still, we're talking about adding some extremely subtle unsafe code to Itertools. I'd like us to take extreme care to avoid accidentally introducing UB.
A PR adding unsafe to itertools should:
- rigorously document the safety and panicking conditions of every
unsafefunction it introduces - prove that every invocation of an unsafe function (even invocations occurring within other unsafe functions) satisfies the safety precondition of that invocation, with citations to official Rust documentation
- rigorously document why any potentially panicking function within an unsafe function does not create invalid state that would cause UB upon panicking unwinds
- intensively test its API with miri
If you can update this PR to do those things, I can see a path forward to merging it.
@jswrenn I will be busy the upcoming week but I'm willing to bring this up to standards after that. If before then you could decide on whether or not to bump the MSRV to 1.51 I could include that in the rewrite.