simde 2intersect: implement AVX512F optimizations proposed by Guille Díez-Cañas arXiv:2112.06342 [cs.DS]

2intersect: implement AVX512F optimizations proposed by Guille Díez-Cañas arXiv:2112.06342 [cs.DS]

Open mr-c opened this issue 1 year ago • 3 comments

https://doi.org/10.48550/arXiv.2112.06342

Be careful to only use the paper as your reference; I'm told that the compressed source code at the end is not OSS.

[ ] (A) Add strict AVX512F optimized versions to the existing simde_mm{,256,512}_2intersect_epi{32,64} functions as shown in Listing 10, page 4; but please confirm that this is still faster than the fallback code on recent GCC/clang using a AVX512F system.
[ ] (B) new simde_x_mm{,256,512}_2intersect_epi{32,64}_mask functions with AVX512F and plain C implementations for returning only the first mask k1 (Listing 7, page 3)
[ ] (C) new simde_x_mm{,256,512}_2interect_epi{32,64}_mask2 functions: versions of (B) when the second set of integer vectors is in-memory (but not loaded into a __m512i register) (Listing 9, page 4) (some other name might be better)
[ ] (D) new simde_x_mm{,256,512}_*_epi16 versions of all of the above for 16-bit vectors with AVX512F and plain C implementations.

May 31 '23 13:05 mr-c