simde
simde copied to clipboard
2intersect: implement AVX512F optimizations proposed by Guille Díez-Cañas arXiv:2112.06342 [cs.DS]
https://doi.org/10.48550/arXiv.2112.06342
Be careful to only use the paper as your reference; I'm told that the compressed source code at the end is not OSS.
- [ ] (A) Add strict AVX512F optimized versions to the existing
simde_mm{,256,512}_2intersect_epi{32,64}
functions as shown in Listing 10, page 4; but please confirm that this is still faster than the fallback code on recent GCC/clang using a AVX512F system. - [ ] (B) new
simde_x_mm{,256,512}_2intersect_epi{32,64}_mask
functions with AVX512F and plain C implementations for returning only the first maskk1
(Listing 7, page 3) - [ ] (C) new
simde_x_mm{,256,512}_2interect_epi{32,64}_mask2
functions: versions of (B) when the second set of integer vectors is in-memory (but not loaded into a__m512i
register) (Listing 9, page 4) (some other name might be better) - [ ] (D) new
simde_x_mm{,256,512}_*_epi16
versions of all of the above for 16-bit vectors with AVX512F and plain C implementations.