RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

VectorRegister load_packed Only Loading First Value

Open rchen20 opened this issue 2 years ago • 1 comments

VectorRegister implementations of load_packed and load_packed_n are only loading the first value of the array. Surprisingly, the = operator works as expected, and performs the load and store properly. The use of load_packed is required for sum/dot/min/max operations on a VectorRegister.

Does not work:

  using vec_b = RAJA::expt::VectorRegister<double, RAJA::expt::scalar_register>;
  vec_b vec_b_test;
  vec_b_test.load_packed_n(&b[0], N);
  vec_b_test.store_packed_n(&c[0], N);

Bad output:

b: 5 5 5 5
c: 5 0 0 0

Works:

  using vec_b = RAJA::expt::VectorRegister<double, RAJA::expt::scalar_register>;
  using idx_b = RAJA::expt::VectorIndex<int, vec_b>;
  using VecB = RAJA::View<RAJA::Real_type,RAJA::StaticLayout<RAJA::PERM_I,N>>;
  auto bV = VecB(b);
  auto cV = VecB(v_c);
  auto vec_all = idx_b::static_all();
  cV(vec_all) = bV(vec_all);

Good output:

b: 5 5 5 5
c: 5 5 5 5

Standard Layout with the assignment operator also works.

rchen20 avatar Mar 17 '23 21:03 rchen20

This behavior is exactly what RAJA::expt::scalar_register is expected to do: load the first value of the array into the VectorRegister, due to being a scalar. At the time, we were using scalar_register on a BlueOS system, when we should have tried a cuda_warp_register. The more typical usage is on a TOSS system with one of the AVX register types, which operates correctly loading all elements of the array into the vector register.

We should document the functionality of scalar_register more clearly, and close this issue.

rchen20 avatar Apr 18 '23 22:04 rchen20