Adam Stylinski
Adam Stylinski
A possible AVX512F-only version of this might involve starting from the AVX2 code and then widening (which do appear to be in the F set of instructions) after the multiply-add....
> Looks like MSVC broken. > > ``` > chunkset_tpl.h(190,50): error C2220: the following warning is treated as an error > (compiling source file '../arch/generic/chunkset_c.c') > > chunkset_tpl.h(190,50): warning C4242:...
Will do but probably won't get to it for a little bit.
Looks like yet another alignment requirement issue similar to what we saw on Android with aarch64 and the ld4 instruction
I believe it needs to be 64.
@dead2 the fix would be the same we did for android, where we use the emulated ld4 implementation
It's not just the taps variable, you'd need to enforce alignment of the buffer itself. At the moment we ensure 16 byte alignment when doing so doesn't completely bypass the...
The intent of the code was not to always ensure alignment but to ensure alignment when doing so doesn't bypass neon entirely. There's a check to make sure that doesn't...
I believe this has been fixed, no?
Ok, so the compiler crashing here is _probably_ a compiler bug (and a very strange one). This should maybe be filed upstream.