threedee-simd
threedee-simd copied to clipboard
use gcc's __builtin_shuffle for more portability instead of __builtin_ia32_shufps
I notice you have macro'd vshuffle
to be __builtin_shufflevector
on clang and __builtin_ia32_shufps
on gcc. But gcc has its own __builtin_shuffle
(which I assume generates __builtin_ia32_shufps
on x86 systems that support it). The added benefit would probably be that it would in time be easier to use the library with NEON.
(this function has a different name from clangs' __builtin_shufflevector but operates in the same way it seems)
another 3D math project in C, which uses a similar macro as threedee-simd but alternates between __builtin_shufflevector
and __builtin_shuffle
: https://github.com/vecio/3DM/blob/master/include/3dm/3dm.h
gcc docs for x86, mentioning __builtin_ia32_shufps
: http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html
gcc docs mentioning __builtin_shuffle
: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html#Vector-Extensions
I'll gladly accept a pull request for this. @Aktau, do you want to do the fix?
Do you happen to know if __builtin_shuffle is a recent addition to GCC? I tried searching for that function earlier but I didn't find it from the docs.
threedee-simd code relies heavily on shuffles, but the shuffle instructions in NEON are very different from SSE.
If you feel like doing some more portability stuff, you could also try to get rid of the few __mm intrinsics that are used in some functions to negate elements of vectors.
Like most of the things (vector extensions) we're using, it's only been pretty usable from gcc 4.7 onwards. So yea, I'd target that as a minimum.
About other portability things: I currently am not devving for ARM, which means I'm unable to test it, but when I see something that's easy to fix, I'll do it.
I am fine with not supporting old compilers.
You don't really need to test on ARM, but it would be nice if you could try compiling for ARM and inspecting that the assembly output looks fine. You will need to build GNU BInutils and GCC for ARM (e.g. --target=arm-linux-eabi). If it compiles for ARM/NEON and works fine on x86 it's probably fine, we can trust that GCC works correctly.