RAJA avx512_int32.hpp uses _mm512_mask_i64scatter

avx512_int32.hpp uses _mm512_mask_i64scatter_epi32 incorrectly

Open jeffhammond opened this issue 2 years ago • 3 comments

You are passing a 512-bit register when a 256-bit register is required.

void _mm512_mask_i64scatter_epi32 (void* base_addr, 
                                   __mmask8 k,
                                   __m512i vindex,
                                   __m256i a,        // <- NOTICE THIS ONE
                                   int scale)

(from docs)

It is used incorrectly in policy/tensor/arch/avx512/avx512_int32.hpp here:

      /*!
       * @brief Store partial register to consecutive memory locations
       *
       */
      RAJA_INLINE
      self_type const &store_strided_n(element_type *ptr, camp::idx_t stride, camp::idx_t N) const{
        // AVX512F
        _mm512_mask_i64scatter_epi32(ptr,
                                     createMask(N),
                                     createStridedOffsets(stride),
                                     m_value,        // <- NOTICE THIS ONE
                                     sizeof(element_type));
        return *this;
      }

You declare the argument here:

    public:
      using register_type = __m512i;
    private:
      register_type m_value;

This bug was found by NVC++, which refuses to compile this code.

Jul 12 '22 06:07 jeffhammond

Also, please expand tabs in this file. Not all of us use tab = 2 spaces, and this file is unreadable with the defaults of some editors.

Jul 12 '22 06:07 jeffhammond

@jeffhammond thanks for bug report! we'll take care of both those issues.

Jul 12 '22 15:07 ajkunen

Hi, If I understand correctly, just do as for gathers (that compiles correctly): replace i64 with i32 (size of index elements) and then compilation will be fine. 4 occurrences: store_strided and store_strided_n in both avx512_float.hpp and avx512_int32.hpp Git-patch enclosed, feel free to look at it. raja_AVX512_scatter_error_git_patch.txt

Aug 10 '22 11:08 eoseret

Closing this issue because it is fixed here https://github.com/LLNL/RAJA/pull/1339.

May 16 '23 22:05 rchen20

RAJA RAJA copied to clipboard

avx512_int32.hpp uses _mm512_mask_i64scatter_epi32 incorrectly

RAJA
RAJA copied to clipboard