rfcs
rfcs copied to clipboard
ArrayBuilder struct for safe/efficient dynamic array initialisation
I think we should have a full-fledged ArrayVec
instead, though having this as a stop-gap measure that is later deprecated could be a good idea.
I think we should have a full-fledged
ArrayVec
instead, though having this as a stop-gap measure that is later deprecated could be a good idea.
There's already an RFC for that. Mentioned in the Prior Art. I would rather get something smaller scope in quickly so that it can start to be used
There are four ArrayVec-like crates that provide this functionality now, so zero reason for "quickly" doing "stop-gaps". If anything, we need a fifth crate that unifies their various features, ala https://github.com/rust-lang/rfcs/pull/2990#issuecomment-848962572 and can thus join std with minimal resistance.
I was speaking with @BurntSushi earlier today, he wasn't so sure if it made sense to have such functionality in std just for the sake of it. But array initialisation is one of those basic things that does seem weird to not have.
While they would be similar implementations. I do think the semantics are different.
Thinking of their location too. ArrayVec makes sense to live in a potential core::collections module ArrayBuilder makes more sense directly in core::array IMO
Yes, it's possible they do not unify cleanly, which maybe his concern. I donno..
To clarify my current thoughts:
- Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum.
- The need for
ArrayVec
in std is less clear. I am unsure of how widely used it is. Moreover, given the number of crates that provideArrayVec
implementations, there is a suggestion that the design space for it is large, and that it would perhaps be better for that process to be carried out in the ecosystem. I note that I do not have any special insight into that design space, I am merely making an observation and drawing a tentative conclusion.
Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum.
Thanks for putting this so succinctly. This is exactly what separates this RFC from the other. Having the minimum needed to enrich core
with what should be core functionality.
I personally don't mind if a full ArrayVec
makes it's way into std, I have no use for it. But that shouldn't prevent the minimum feature set described above being incorporated first.
The ArrayBuilder
is pretty much the ArrayVec
type: build()
could have been try_into()
. Giving it a more generic name (not builder) would make the type more useful. It derefs to a slice, so this type is already quite usable as a stack/inline-allocated Vec
.
I disagree with @BurntSushi's assessment of ArrayVec
. With 20 million downloads, arrayvec
is one of the most popular Rust crates. The API is rather simple: it's a Vec
. The existing ArrayVec
API is fine.
There are multiple crates with seemingly overlapping functionality, because:
- there's a bunch of forks/alternatives that were created to add const fn and const generics support,
- there are variants that can seamlessly grow to be heap allocated, or are basically a small vector optimization.
The case 1 is not a problem any more. The case 2 can easily be declared out of scope. There are pros and cons of using the heap for extra growth, but for std
— or rather core
— it makes sense to focus in the array part more, and have a Copy
-compatible nostd-compatible simple type.
ArrayVec
solves the initialization problem pretty well. It's more flexible than a callback-based initialization method. Amount of boilerplate code is comparable to MaybeUninit
, but safe. When used with a simple loop, it should be a zero-cost abstraction.
I'm not insisting that Rust needs have this type in std/core. Rust already requires 3rd party crates for a lot of basic functionality, and arrayvec works as a crate just fine. But if Rust wanted to have a built-in solution to initializing arrays, adopting arrayvec
is a good way to do it. It happens to solve the initialization problem, and is generally useful as an on-stack-Vec too.
I disagree with @BurntSushi's assessment of
ArrayVec
. With 20 million downloads,arrayvec
is one of the most popular Rust crates.
Just to be clear, I said I was unsure of how widely used it is, not that it isn't widely used. And download count is somewhat deceptive. For example, about 1/3 of arrayvec
's downloads come from csv-core
, where arrayvec
is a dev-dependency. Its total dependents is about 300, which to me says that yeah it's fairly popular. I agree with that. The question is always whether that's enough to to push it over the top and into std.
I'm not insisting that Rust needs have this type in std/core. Rust already requires 3rd party crates for a lot of basic functionality, and arrayvec works as a crate just fine. But if Rust wanted to have a built-in solution to initializing arrays, adopting arrayvec is a good way to do it. It happens to solve the initialization problem, and is generally useful as an on-stack-Vec too.
OK, so I finally had a chance to actually look at this RFC and the API is a lot bigger than what I thought it would be.
When I said "Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum." above, what I had in mind was something small and primitive like this: https://github.com/rust-lang/rust/pull/75644
This RFC is a lot more than a simple primitive for building arrays.
Ideally, the primitive would/could be used by the various ArrayVec crates to reduce some unsafe
code. So it might be good to get feedback from them on the from_fn
API in that PR.
OK, so I finally had a chance to actually look at this RFC and the API is a lot bigger than what I thought it would be.
Ultimately, there's only 3 functions I care about here.
impl<T, const N: usize> ArrayBuilder<T, N> {
pub fn new() -> Self;
pub unsafe fn push_unchecked(&mut self, t: T); // UB if array is full
pub unsafe fn build_unchecked(self) -> [T; N]; // UB if not full
}
With these primitive functions, you can build from_fn
quite simply
fn array_from_fn<T, const N: usize>(f: impl FnMut(usize) -> T) -> [T; N] {
let mut array = ArrayBuilder::<T, N>::new();
for i in 0..N {
unsafe { array.push_unchecked(f(i)); } // safe because array won't be full
}
unsafe { array.build_unchecked() } // safe because array will be full
}
Similarly, a FromIterator
impl is trivial
impl<T, const N: usize> FromIterator<T> for [T; N] {
fn from_iter<I>(iter: I) -> Self where I: IntoIterator<Item=T> {
let mut iter = iter.into_iter();
for _ in 0..N {
unsafe { array.push_unchecked(iter.next().unwrap()); } // safe because array won't be full
}
unsafe { array.build_unchecked() } // safe because array will be full
}
}
The rest of the functions presented in the RFC are just natural extensions. I'm impartial to the pop
functions. I haven't had any use for them, so they can be removed if considered out of scope.
What this ultimately has over just having from_fn
built from scratch, is that from_fn
is all or nothing. With ArrayBuilder
, I can stop half way through, or I can hit an error case, handle it and then extract the initialised values from the Deref
slice and clone them into a vec if I were to want them
I have a working implementation
I would like to reiterate support for ArrayVec
in core over something like this. I wouldn't be opposed to having something like array_from_fn
mentioned earlier in addition to an ArrayVec
equivalent, but I would be opposed to some ArrayBuilder
struct whose functionality wouldn't be extendable to ArrayVec
in the future. To be clear, creating some form of ArrayVec
while committing only to a very small subset of methods initially would be acceptable imho.
My main justification for ArrayVec
is that, IMHO, the use case of "I want between 0 and N
elements" is pretty standard, especially with small N
. Even something as simple of a 0-2 item array seems extremely reasonable to have on the stack, whereas Vec
would add an entire other layer of indirection.
Plus, the ability to have an equivalent to Vec
that works in embedded and other no_std
environments is a huge plus. I personally use ArrayVec
in crates where I know that the upper bound size is small that otherwise would require alloc
for such use cases.
Another big thing to point out is that we technically have a cursed equivalent to ArrayVec
in libcore in the form of slice::IntoIter
, where the use case of a list of 0 to N
elements can be covered by a bit of messing around with IntoIter::new
.
Plus, there are already a couple things in the compiler itself that should be using some equivalent of ArrayVec
but instead use jank enums, like CaseMappingIter
Additional note rereading this: a lot of the time the case where I want "0 to N
elements" is in return values, and without unsized rvalues, something like ArrayVec
is necessary. IMHO, even with unsized rvalues, I feel like ArrayVec
would be better in those cases.
If I understand this RFC correctly, the main motivation is to be able to safely initialize an array, which can already be done with the soon-to-be stabilized [T;N]::map
function:
The examples from the RFC written with map
:
use core::array;
let mut values = array::IntoIter::new(["a", "b", "c", "d"]);
let arr: [String; 4] = [(); 4].map(|_| {
values.next().unwrap().to_string()
});
let mut values = 1..=10;
let arr: [usize; 10] = [(); 10].map(|_| {
let i = values.next().unwrap();
i * i
});
The iterator example is relatively difficult to replicate and requires the [T; N]::try_map
, which is not yet implemented (seems to be blocked on a design decision regarding the try_trait
). Therefore, an extension trait is used to accomplish this:
#![feature(
try_trait_v2,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array
)]
use core::iter;
use core::mem::MaybeUninit;
use core::ops::Try;
pub trait ArrayExt<T, const N: usize> {
fn try_map<F, U, R, X>(self, f: F) -> R
where
X: Try<Output = U>,
R: Try<Output = [U; N], Residual = X::Residual>,
F: FnMut(T) -> X;
}
impl<T, const N: usize> ArrayExt<T, N> for [T; N] {
fn try_map<F, U, R, X>(self, mut f: F) -> R
where
X: Try<Output = U>,
R: Try<Output = [U; N], Residual = X::Residual>,
F: FnMut(T) -> X,
{
let mut array: [MaybeUninit<U>; N] = MaybeUninit::uninit_array();
let mut iterator = core::array::IntoIter::new(self);
for item in array.iter_mut() {
// NOTE: it is guranteed that this will not panic
let next = iterator.next().unwrap();
*item = MaybeUninit::new(f(next)?);
}
// SAFETY: because of the previous loops all values are guranteed to be
// initialized
let result: [U; N] = unsafe { MaybeUninit::array_assume_init(array) };
R::from_output(result)
}
}
// this should do almost the same thing as in the example, except for the "remaining" function
fn chunked_iterator<I, const N: usize>(mut iter: I) -> impl Iterator<Item = [I::Item; N]>
where
I: Iterator,
{
iter::from_fn(move || {
let values: Option<[_; N]> = [(); N].try_map(|_| iter.next());
values
})
}
I also want to point out that people are currently planning an abstraction called Storage
, with which it may be possible in the future to use the standard collections (like Vec
) without an allocator or code duplication:
https://github.com/rust-lang/wg-allocators/issues/79
The only thing that the array map misses that this RFC aims to provide (as you pointed out) is partial array state. Being able to take a slice to the initialised values if it's not a complete array. Similarly, I imagined the proposed type to be how you would implement such a try_map
.
impl<T, const N: usize> ArrayExt<T, N> for [T; N] {
fn try_map<F, U, R, X>(self, mut f: F) -> R
where
X: Try<Output = U>,
R: Try<Output = [U; N], Residual = X::Residual>,
F: FnMut(T) -> X,
{
let mut array: ArrayBuilder<U, N> = ArrayBuilder::new();
for item in core::array::IntoIter::new(self) {
// SAFETY: N elements have not yet been pushed
unsafe { array.push_unchecked(f(item)?); }
}
// SAFETY: previous loop guarantees N elements have been set
let result: [U; N] = unsafe { array.build_unchecked() };
R::from_output(result)
}
}
I also want to point out that people are currently planning an abstraction called Storage, with which it may be possible in the future to use the standard collections (like Vec) without an allocator or code duplication:
This is really interesting. I'm a big fan of the proposed storage api. If that gets decided on this RFC is indeed irrelevant as a the ArrayBuilder<T, N>
would be superceded by Vec<T, [MaybeUninit<T>; N]>
(given there's a way to extract the raw storage out of the vec)
Just linking here. - #2990 was closed, but I just opened #3316 which proposes to add an ArrayVec
(and so far seems mildly positively reviewed), so perhaps it would be good to condense effort there.
However - as mentioned on that thread, the new storages API would likely supplant any need for a separately-implemented ArrayVec
, which is a good thing.
I really wanted to use this API, but the codegen is not great, I compared the following 3 implementations:
use core::mem::MaybeUninit;
use array_builder::ArrayBuilder;
pub fn write_array0(f: fn() -> u8) -> [u8; 1024] {
core::array::from_fn(|_|f())
}
pub fn write_array1(f: fn() -> u8) -> [u8; 1024] {
let mut uninit = [MaybeUninit::uninit(); 1024];
for i in 0..1024 {
uninit[i] = MaybeUninit::new(f());
}
unsafe { (uninit.as_ptr() as *const [u8; 1024]).read() }
}
pub fn write_array2(f: fn() -> u8) -> [u8; 1024] {
let mut builder = ArrayBuilder::new();
for _ in 0..1024 {
builder.push(f());
}
builder.build().unwrap()
}
And the produced ASM is as follows:
write_array0
push r15
push r14
push rbx
sub rsp, 2064
mov r15, rsi
mov r14, rdi
xor ebx, ebx
.LBB9_1:
call r15
mov byte, ptr, [rsp, +, rbx, +, 9], al
inc rbx
cmp rbx, 1024
jne .LBB9_1
lea r15, [rsp, +, 1033]
lea rsi, [rsp, +, 9]
mov rbx, qword, ptr, [rip, +, memcpy@GOTPCREL]
mov edx, 1024
mov rdi, r15
call rbx
mov edx, 1024
mov rdi, r14
mov rsi, r15
call rbx
mov rax, r14
add rsp, 2064
pop rbx
pop r14
pop r15
ret
.LBB9_4:
mov r14, rax
lea rsi, [rsp, +, 9]
mov rdi, rbx
call core::ptr::drop_in_place<core::array::Guard<u8,1024_usize>>
mov rdi, r14
call _Unwind_Resume
ud2
write_array1
push r15
push r14
push r12
push rbx
sub rsp, 1032
mov r15, rsi
mov r14, rdi
xor ebx, ebx
.LBB10_1:
lea r12, [rbx, +, 1]
call r15
mov byte, ptr, [rsp, +, rbx, +, 8], al
mov rbx, r12
cmp r12, 1024
jne .LBB10_1
lea rsi, [rsp, +, 8]
mov edx, 1024
mov rdi, r14
call qword, ptr, [rip, +, memcpy@GOTPCREL]
mov rax, r14
add rsp, 1032
pop rbx
pop r12
pop r14
pop r15
ret
write_array2
push rbp
push r15
push r14
push rbx
sub rsp, 2072
mov r15, rsi
mov r14, rdi
mov qword, ptr, [rsp, +, 1032], 0
mov ebp, 1024
xor ebx, ebx
.LBB11_1:
call r15
cmp rbx, 1024
jae .LBB11_3
mov byte, ptr, [rsp, +, rbx, +, 8], al
mov rbx, qword, ptr, [rsp, +, 1032]
inc rbx
mov qword, ptr, [rsp, +, 1032], rbx
dec ebp
jne .LBB11_1
cmp rbx, 1024
jne .LBB11_7
lea rsi, [rsp, +, 8]
mov edx, 1024
mov rdi, r14
call qword, ptr, [rip, +, memcpy@GOTPCREL]
mov rax, r14
add rsp, 2072
pop rbx
pop r14
pop r15
pop rbp
ret
.LBB11_3:
lea rdi, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.0]
lea rdx, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.2]
mov esi, 30
call qword, ptr, [rip, +, _ZN4core9panicking5panic17h90931f06a97cc5e0E@GOTPCREL]
jmp .LBB11_4
.LBB11_7:
lea r14, [rsp, +, 1040]
lea rsi, [rsp, +, 8]
mov edx, 1024
mov rdi, r14
call qword, ptr, [rip, +, memcpy@GOTPCREL]
mov qword, ptr, [rsp, +, 2064], rbx
lea rdi, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.4]
lea rcx, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.5]
lea r8, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.14]
mov esi, 43
mov rdx, r14
call qword, ptr, [rip, +, _ZN4core6result13unwrap_failed17hdc73d4affce1d414E@GOTPCREL]
.LBB11_4:
ud2
.LBB11_9:
mov rbx, rax
jmp .LBB11_10
.LBB11_11:
jmp .LBB11_12
.LBB11_13:
.LBB11_12:
mov rbx, rax
lea r14, [rsp, +, 8]
.LBB11_10:
mov rdi, r14
call core::ptr::drop_in_place<array_builder::ArrayBuilder<u8,1024_usize>>
mov rdi, rbx
call _Unwind_Resume
ud2