alpaka
alpaka copied to clipboard
alignment of CUDA/HIP shared memory ?
GetDynSharedMem<T, BlockSharedMemDynUniformCudaHipBuiltIn>::getMem(acc)
is defined as:
https://github.com/alpaka-group/alpaka/blob/9b15e664d103c581020aa5285171b67483eb5c59/include/alpaka/block/shared/dyn/BlockSharedMemDynUniformCudaHipBuiltIn.hpp#L38-L46
- if the concern is that the memory may not be aligned enough for
T
, why not declare it as
extern __shared__ T shMem[];
?
- if the concern is that the memory may not be aligned enough for any type, why not declare it as
extern __shared__ __attribute__((aligned(16))) char shMem[];
?
- do you think it could be helpful to declare this alignment, with
__device__ static auto getMem(BlockSharedMemDynUniformCudaHipBuiltIn const&) __attribute__((assume_aligned(16)));
-> T*
?