sse2neon icon indicating copy to clipboard operation
sse2neon copied to clipboard

Add performance measurement of each intrinsic function

Open marktwtn opened this issue 5 years ago • 3 comments

The conversion of intrinsic function may be rewritten sometimes. A performance measurement result is good for checking whether the rewritten conversion behaves better or not.

marktwtn avatar Feb 04 '20 11:02 marktwtn

Somewhat related, but not exactly the same:

Something would be nice would be to have a list of all the implemented intrinsics and to how many NEON intrinsics that is needed to implement it. While this doesn't give you a 1:1 mapping to how fast something is, it still gives the user and idea what might be good to avoid.

emoon avatar Jun 22 '22 19:06 emoon

In in a similar view, the emscripten SIMD page (https://emscripten.org/docs/porting/simd.html) has an emoji for each instruction.

✅ Wasm SIMD has a native opcode that matches the x86 SSE instruction, should yield native performance

💡 while the Wasm SIMD spec does not provide a proper performance guarantee, given a suitably smart enough compiler and a runtime VM path, this intrinsic should be able to generate the identical native SSE instruction.

🟡 there is some information missing (e.g. type or alignment information) for a Wasm VM to be guaranteed to be able to reconstruct the intended x86 SSE opcode. This might cause a penalty depending on the target CPU hardware family, especially on older CPU generations.

⚠️ the underlying x86 SSE instruction is not available, but it is emulated via at most few other Wasm SIMD instructions, causing a small penalty.

❌ the underlying x86 SSE instruction is not exposed by the Wasm SIMD specification, so it must be emulated via a slow path, e.g. a sequence of several slower SIMD instructions, or a scalar implementation.

💣 the underlying x86 SSE opcode is not available in Wasm SIMD, and the implementation must resort to such a slow emulated path, that a workaround rethinking the algorithm at a higher level is advised.

💭 the given SSE intrinsic is available to let applications compile, but does nothing.

⚫ the given SSE intrinsic is not available. Referencing the intrinsic will cause a compiler error.

Starbuck5 avatar Mar 26 '23 23:03 Starbuck5