libsimdpp loads and stores with vectors can read/write more than the vector size

this function, for example

#define SIMDPP_ARCH_X86_AVX2
#include <simdpp/simd.h>

void sum(double* out, double const* lhs, double const* rhs) {
  using vec_t = simdpp::float64<1>;
  auto l = simdpp::load_u<vec_t>(lhs);
  auto r = simdpp::load_u<vec_t>(rhs);
  simdpp::store_u(out, l + r);
}

will load and write 4 doubles instead of a single one, which may result in an unexpected buffer overflow. is this the intended behavior?

Jun 12 '20 02:06 sarah-quinones

Indeed, the support for vectors of size smaller than the smallest native size is currently not fully implemented.

Jun 13 '20 21:06 p12tic

it's not just sizes smaller than the smallest native size. for example, the above code generates the correct instructions if we use float64<2>, but not with float64<3> (loads/stores 4 doubles) or even float64<6> (loads/stores 8 doubles).

i think vectors of size N that's a power of 2 and smaller than the smallest native size could be implemented as unaligned std::array<double, N> for example.
as for the ones that are of size N larger than the largest native size, we could implement them like this (assuming the largest size is 4 for example)

struct float64 {
  array<float64<4>, N/4> first;
  array<float64<2>, (N % 4) / 2> second;
  array<float64<1>, N % 2> third;
};

would this be an acceptable way of handling it?

EDIT: on second thought, it'd require slightly more work to handle the cases where the array sizes are 0. but it should still be feasible

Jun 14 '20 11:06 sarah-quinones