Evan Nemerson
Evan Nemerson
It's not as well tested, but yes, C++ is supported.
The cmake stuff is just for tests, you shouldn't use it. Usually you just drop them in to your project, they should work is pretty much any C/C++ compiler without...
x86, and especially Linux, are pretty well tested. So is ARM; this issue is really more for more exoctic architectures and compilers. uarch-bench looks very cool, though; could be useful...
Nice, thanks. Those would be great for `vminq_u16`, `vmaxq_u16`, and `vqsubq_s32` ☺. There are tons of these floating around the internet, and I'd like to try to get as many...
> return _mm_xor_si128(_mm_cmpgt_epi32(x, y), > _mm_srai_epi32(_mm_xor_si128(x, y), 31)); https://godbolt.org/z/T73MbPEnh I agree, the throughput isn't quite as good, but the latency on that mov is painful, plus the memory to store...
It's a GCC bug that only manifests with optimization enabled. It's probably architecture-specific, though I haven't tested. Porting the failing tests to use the new style tests resolves the issue....
I have the fix completed, I'm just trying to get everything through CI. It should be done soon.
Is this with qemu, or on real hardware? I'm not seeing any failures on real hardware, but qemu is a different story…
I like the idea, I just can't think of a realistic way to do it. Functions often call other functions, so we would need some sort of way to calculate...
I'm not really familiar with LLVM's internals, so I can't say for sure, but it would probably be possible to write something which parsed the code, eliminated dead code, then...