libflatarray icon indicating copy to clipboard operation
libflatarray copied to clipboard

add short_vec implementation for CUDA

Open gentryx opened this issue 9 years ago • 0 comments

...to utilize float, float2, float4 (artity: WARP_SIZE * 4), double (arity WARP_SIZE), double2 (arity WARP_SIZE * 2) and corresponding load/store operations. needs benchmarks, obviously.

bonus points for using warp shuffle operations.

moved from https://bitbucket.org/gentryx/libflatarray/issues/12/add-short_vec-implementation-for-cuda

gentryx avatar Jan 01 '16 15:01 gentryx