Implementing intrinsics that were released along a wider register type
Some intrinsics for a size N are introduce in the same generation that introduces a register of size `2N.
_mm_srlv_epi32(128 bits) is introduced in Avx2, along with_mm256- Plently of 128 and 256 API introduced in AVX512
I'm wondering how to implement that in Xsimd.
My best understanding is that when compiling with AVX2, std::make_sized_batch<uint8_t, 16>() will return an xsimd::sse4.2 architecture and the dispatch mechanism cannot know from requires_arch that the AVX2 128 bit instruction is available.
One way to work around it is using if constexpr(supported_architectures::contains<avx2>()) but that seems to duplicate the dispatch mechanism.
Another possibility could be to decouple the architecture from the register type.
What do you think @JohanMabille @serge-sans-paille ?
I was thinking of the same problem. My guess for now would be to introduce an sse_avx sse_av2, sse_vl register in the hierarchy.
Then the make_sized_batch returns the appropriate one. In avx and avx2, avx512 we need to override the forward to sse and forward to avx functions.
An alternative is to have an avx<sse> class as it happens with fma
PS: this is related to #1009 so probably requires some more thoughts.
I think there are two orthognoal problems here. The first one is the way we represent the instruction set extensions in xsimd. This is a topic we've been discussing for quite a long with Serge, and so far the idea is to be able to add "flavors" to the instruction set tag; either with template parameters (as suggested in #1009 and by @DiamonDinoia ), or with expressions like "avx & fma".
The second one is that the arch we pass to the implementation functions is that of the batch (see https://github.com/xtensor-stack/xsimd/blob/master/include/xsimd/types/xsimd_api.hpp#L60 for instance). That could be fixed with something like:
template <class T, class A>
XSIMD_INLINE batch<T, A> abs(batch<T, A> const& x) noexcept
{
detail::static_check_supported_config<T, A>();
return kernel::abs<A>(x, detected_arch{});
}
My naive thinking is to use XSIMD_DECLARE_SIMD_REGISTER_ALIAS to declare a sse_avx register that inherits from sse4_2 and overrride just the kernels that benefits from avx on sse. May I ask where this falls apart?
My naive thinking is to use XSIMD_DECLARE_SIMD_REGISTER_ALIAS to declare a sse_avx register that inherits from sse4_2 and overrride just the kernels that benefits from avx on sse. May I ask where this falls apart?
That's the easy part. You also want this type to derive from sse4_2, so that automatic fallback works as expected.
Now for the difficult part, what would be the naming scheme? So far we've use template composition, e.g.
fma3<sse4_2> to specify sse4.2 with fma3 extension.
so in that spirit we would have
avx512f<sse4_2> to specify sse4.2 with avx512f extensions.
Unfortunately we already use avx512f as an architectural type. But maybe
ext::avx512f<sse4_2> would be good? That way we would also have
ext::avx512f
This may mean we'd use ext::fma3 instead of fma3, that's an API break but I'm fine with it.
I like that idea and can implement it before the release, but it's a non negligible feature change, so I think it's worth being merged after the release, so that we can peacefully explore the consequences after the release.
I agree that this change should be merged after the release. Regarding the scheme, we could keep backward compatibility by defining ext::fma3 as fma3 first, and then remove fma3 latter when we decide to cut a major release.
Sure, I also agree that this is something for after the release! I just wanted to brainstorm since the discussion was open