loads and stores with vectors can read/write more than the vector size
this function, for example
#define SIMDPP_ARCH_X86_AVX2
#include <simdpp/simd.h>
void sum(double* out, double const* lhs, double const* rhs) {
using vec_t = simdpp::float64<1>;
auto l = simdpp::load_u<vec_t>(lhs);
auto r = simdpp::load_u<vec_t>(rhs);
simdpp::store_u(out, l + r);
}
will load and write 4 doubles instead of a single one, which may result in an unexpected buffer overflow. is this the intended behavior?
Indeed, the support for vectors of size smaller than the smallest native size is currently not fully implemented.
it's not just sizes smaller than the smallest native size. for example, the above code generates the correct instructions if we use float64<2>, but not with float64<3> (loads/stores 4 doubles) or even float64<6> (loads/stores 8 doubles).
i think vectors of size N that's a power of 2 and smaller than the smallest native size could be implemented as unaligned std::array<double, N> for example.
as for the ones that are of size N larger than the largest native size, we could implement them like this (assuming the largest size is 4 for example)
struct float64 {
array<float64<4>, N/4> first;
array<float64<2>, (N % 4) / 2> second;
array<float64<1>, N % 2> third;
};
would this be an acceptable way of handling it?
EDIT: on second thought, it'd require slightly more work to handle the cases where the array sizes are 0. but it should still be feasible