xsimd
xsimd copied to clipboard
Support gather for different sizes of types on data and indices
We are recently using xsimd to make our Velox query evaluation engine portable. One of the gap we found is xsimd does not support gather for different sizes of types on data and indices. For example we have gather of int64 data with int32 indices, and this can be implemented using __m256i as data register and __m128i as index register on AVX2. Is there a way to solve this? You can refer to our implementation for some idea. If you agree with our approach, we can even help integrate the implementation into xsimd.
In our project we implemented a HalfBatch type that can return __m128i on AVX2 and use it. The details can be found here: https://github.com/facebookincubator/velox/blob/main/velox/common/base/SimdUtil.h#L76-L132
Our gather and maskGather implementation: https://github.com/facebookincubator/velox/blob/main/velox/common/base/SimdUtil.h#L134-L268
I am happy to answer any questions you have. And thank you for creating this library, it really helps to allow us to rewrite our SIMD code in a portable and readable manner.
I'm not sure about the HalfBatch, but if I ere to implement it in xsimd, I would make it a type adaptor, something alike
xsimd::half_batch<B>::type instead of introducing new batch types. It would map batch<float, avx2> to batch<float, sse4.2>
But the fact that it doesn't have any specialization for sse it disturbs me.
This is what I did to hand-optimize two cases we use at Krita:
https://github.com/xtensor-stack/xsimd/blob/c7567bbedebcfbf3ba95304ff1a6722b32a0d63f/include/xsimd/arch/xsimd_avx2.hpp#L350-L369
Instead of using separate batch types, I would suggest to SFINAE on the size of the index batch.