SIMD traces for mulhi usage
Action item: Intel folks to see in their traces how the instructions are used (variable or constants as inputs).
As I recall, the issue was whether to restrict just one of the source operands to be constant for this instruction:
__m128i _mm_mulhi_epi16 (__m128i a, __m128i b); (PMULHW)
It would be helpful, if someone (James @jzern ?) could point out where in the Webp benchmark this instruction is used.
EDIT: I did a search for mulhi in the https://github.com/webmproject/libwebp repo and got a bunch of hits in the dsp directory. Are those the right ones to look at?
On the portable-intrinsics branch there's examples for neon, sse2 and portable-intrinsics, the second value for all calls are constants. The NEON half of the portable intrinsics could be refined like dec_neon.c, it's using the same constant values as sse2 for convenience in the implementation.
https://chromium.googlesource.com/webm/libwebp/+/0af22e17d67e6b81fee6d42a53ce6f40aad416e1/src/dsp/dec_wasm.c#115 https://chromium.googlesource.com/webm/libwebp/+/0af22e17d67e6b81fee6d42a53ce6f40aad416e1/src/dsp/dec_neon.c#975 https://chromium.googlesource.com/webm/libwebp/+/0af22e17d67e6b81fee6d42a53ce6f40aad416e1/src/dsp/dec_sse2.c#88
Thanks @jzern !
I was looking at the ARM NEON instruction manual for the VQDMULH instruction and didn't see that it requires one of the source operands to be constant. If both SEE and NEON support both operands being non-constant, a potential WASM instruction for mulhi might as well do that too, right? Maybe I didn't read the NEON documentation right. Here's the info I'm looking at:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489g/CJAJIIGG.html
(I couldn't find a permalink to the actual instruction, so you have to search for it :(
I was looking at the ARM NEON instruction manual for the VQDMULH instruction and didn't see that it requires one of the source operands to be constant. If both SEE and NEON support both operands being non-constant, a potential WASM instruction for mulhi might as well do that too, right?
You're right NEON doesn't. The intrinsics do offer a scalar variant, though. So 2 non-constants is an option, one thing that needs to be considered is the range, however. With the doubling that the NEON does it forces one vector to 15 bits.
SIMD proposal merged, closing as no longer relevant.