hipamd
hipamd copied to clipboard
Correctly set the index value for __shf_up.
Please see https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_subgroups.html for the details of the shuffles.
This was uncovered when writing libclc's Intel subgroup shuffles, which use the same built-in bpermute
(https://github.com/intel/llvm/pull/4664/files) and was failing tests from llvm-test-suite (among others: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/SubGroup/shuffle.hpp#L88).
@jchlanda, (self & ~(width-1)) is the lowest lane in the group of width lanes that includes self. If index, the source lane, is below that value, then the shuffle up operation for lane self is a no-op. I do not believe the proposed patch is correct.
Suppose width = 4, and self = 2, and lane_delta = 5. Then (self & ~(width-1)) = 0. The first assignment of index results in -3. The proposed patch incorrectly keeps the index at -3, whereas the current code replaces index with 2 which is correct.