RAJA
RAJA copied to clipboard
avx512_int32.hpp uses _mm512_mask_i64scatter_epi32 incorrectly
You are passing a 512-bit register when a 256-bit register is required.
void _mm512_mask_i64scatter_epi32 (void* base_addr,
__mmask8 k,
__m512i vindex,
__m256i a, // <- NOTICE THIS ONE
int scale)
(from docs)
It is used incorrectly in policy/tensor/arch/avx512/avx512_int32.hpp here:
/*!
* @brief Store partial register to consecutive memory locations
*
*/
RAJA_INLINE
self_type const &store_strided_n(element_type *ptr, camp::idx_t stride, camp::idx_t N) const{
// AVX512F
_mm512_mask_i64scatter_epi32(ptr,
createMask(N),
createStridedOffsets(stride),
m_value, // <- NOTICE THIS ONE
sizeof(element_type));
return *this;
}
You declare the argument here:
public:
using register_type = __m512i;
private:
register_type m_value;
This bug was found by NVC++, which refuses to compile this code.
Also, please expand tabs in this file. Not all of us use tab = 2 spaces, and this file is unreadable with the defaults of some editors.
@jeffhammond thanks for bug report! we'll take care of both those issues.
Hi, If I understand correctly, just do as for gathers (that compiles correctly): replace i64 with i32 (size of index elements) and then compilation will be fine. 4 occurrences: store_strided and store_strided_n in both avx512_float.hpp and avx512_int32.hpp Git-patch enclosed, feel free to look at it. raja_AVX512_scatter_error_git_patch.txt
Closing this issue because it is fixed here https://github.com/LLNL/RAJA/pull/1339.