Potential feature: safe align_to
What do you think of something like this? I realize that you can also do this with bytemuck, but this would let you write more generic code when operating on larger arrays handling the head/tail special case.
pub trait AlignedTo where Self : Sized
{
type Element;
fn align_to(array : &[Self::Element]) -> (&[Self::Element], &[Self], &[Self::Element]);
fn align_to_mut(array : &mut [Self::Element]) -> (&mut [Self::Element], &mut [Self], &mut [Self::Element]);
}
https://docs.rs/bytemuck/latest/bytemuck/fn.pod_align_to.html
This seems the same as the existing bytemuck version, including the head and tail. I'm unclear on the difference you're after.
The background of this is that now that there's the AVX512 types, I'm trying to figure out a nice way to write code that is independent of lane size....
For example, if you had this:
pub trait AlignTo<Element> where Element: Sized, Self: Sized,
{
fn align_to(source: &[Element]) -> (&[Element], &[Self], &[Element]);
fn align_to_mut(source: &mut [Element]) -> (&mut [Element], &mut [Self], &mut [Element]);
}
You could do this:
fn sum_vectors_generic<
E: Add<Output = E> + Copy,
WideType: AlignTo<E> + Add<Output = WideType> + Copy,
>(
a: &[E],
b: &[E],
result: &mut [E],
) {
assert_eq!(a.len(), b.len());
assert_eq!(a.len(), result.len());
let (a_head, a_mid, a_tail) = WideType::align_to(a);
let (b_head, b_mid, b_tail) = WideType::align_to(b);
let (c_head, c_mid, c_tail) = WideType::align_to_mut(result);
for i in 0..a_head.len() {
c_head[i] = a_head[i] + b_head[i];
}
for i in 0..a_mid.len() {
let va = a_mid[i];
let vb = b_mid[i];
c_mid[i] = va + vb;
}
for i in 0..a_tail.len() {
c_tail[i] = a_tail[i] + b_tail[i];
}
}
then:
// SSE2
sum_vectors_generic::<i16, i16x8>(&a, &b, &mut c);
/// AVX2
sum_vectors_generic::<i16, i16x16>(&a, &b, &mut c);
/// AVX-512
sum_vectors_generic::<i16, i16x32>(&a, &b, &mut c);
Well, two things jump out at me:
- if you want to have a trait for this behind the
bytemuckfeature and with impls that call out to bytemuck (to avoid duplicating unsafe code) I'll merge the PR for it. - what you want in the example code doesn't necessarily work out like that. Imagine that the
aslice aligns to 8, while thebslice aligns to 16. Now the head portions will be different lengths, and mid and tail too. Unfortunately, for something like this you have to check the alignment of all data runs and then use the least aligned one as the basis for how to transform the other two.
Yeah you're right... it would have to assert that they start on the same alignment.
Things are moving more and more towards vector size agnostic code so I was thinking of some small steps that would make code less vector size dependent.
One other nice thing that would fit into this is that if there were trait version of the common non-arithmetic operations, like abs, saturating_add, max, min, etc.