ispc icon indicating copy to clipboard operation
ispc copied to clipboard

Possible improvement for packed_load_active/packed_load_active2 on AVX2

Open dbabokin opened this issue 3 years ago • 0 comments

This StackOverflow article describes an algorithm for packed_load_active/packed_load_active2 for AVX2. The algorithm produces a vector register, so it will need to be stored with a mask and the number of stored elements will need to be counted separately. But it still may be a better algorithm.

This needs experiments for better understanding of performance impact.

For AVX512 we use vcompressd instruction, so no need for improvement for AVX512.

dbabokin avatar Apr 06 '22 01:04 dbabokin