Jan Wassenberg
Jan Wassenberg
Oh, good catch! That's likely it. We seem to have 512-bit VLEN. Note that SortTag uses LMUL=1/2. The problem is that the base case is meant to handle at least...
The sort itself does check for the problem, but TestAllPartition did not, and soon will. Unfortunately our CI doesn't work with 1024, so not entirely certain this fixes it.
Thanks for confirming!
It sounds reasonable to want access to this instruction. I'm curious about the purpose of the extra 2x mul in their definition? Because NEON differs from SVE in that it...
Makes sense. `ReorderWidenMulAccumulate` also returns a second value using an output param, so there is precedent. So the proposed op would call both vqdmlal and vqdmlal_high on NEON, and svqdmlalb...
Got it, thanks. In that case adding FixedPoint to the op name may be helpful.
Very nice! @Ryo-not-rio FYI John's solution defines the op in terms of NEON/x86, so for SVE we have two extra Zip. Does that work for you?
If I understand correctly, the issue is that we use `FixedTag`, which on SVE requires Load/Store etc to do extra work to limit the work to 128 bits. +1 to...
hm, if the code is isolated and not alternating between SVE/NEON in the same function or source file, it is easy to compile one source file with SVE disabled (so...
It can work like this. ``` template NeonType Func(D d) { return NeonType(); } template SveType Func(D d) { return SveType(); } ``` and for functions not involving a `D=Simd`,...