simd icon indicating copy to clipboard operation
simd copied to clipboard

Branch of the spec repo scoped to discussion of SIMD in WebAssembly

Results 47 simd issues
Sort by recently updated
recently updated
newest added

``` #include #define SIMDPP_ARCH_X86_SSE4_1 1 #include "simdpp/simd.h" #include #include #include #include #include #include #include #include // SWIZZLE constants const static __u32x4 sw1 __attribute__((require_constant_initialization)) = {0xffffff00, 0xffffff01, 0xffffff02, 0xffffff03}; const static...

toolchain

Packed horizontal arithmetic is reasonably performant on SSE3+ and Neon. These would be useful for complex multiplications, and in the absence of the opcodes below, these would need to be...

post SIMD MVP

`v128.const` and `v128.shuffle` instructions are always 18 bytes long, which is excessive in the majority of cases. For comparison, native shuffle instructions on x86-64 are at most five bytes long,...

I've taken a stab at documenting the performance tradeoffs between various instructions and collected the information in this repository: https://github.com/zeux/wasm-simd Obviously this is far from a normative reference, and is...

perf documentation

Hi, add, sub, div, sqrt, mul, and float->integer conversions, should all have a third optional argument, to allow changing a rounding mode (nearest, toward plus infinity, towards minus infinity, to...

post SIMD MVP

This is open-ended. The problem is that many key use cases, such as matrix multiplication kernels, need to know a number of SIMD vector registers that they can count on...

post SIMD MVP

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This...

In SIMD instruction sets where integer arithmetic is really first-class, support for different bit width in different operands is not just an ad-hoc addition for a few instructions. Instead, generically...

Unlike the float case where the fused-vs-unfused issue creates complications (PR #79) in the integer case there is no downside to using single-instruction multiply-add. These are vital to getting above...

I looked into single-precision Mersenne Twister after @ngzhian did double-precision port, and its core function relies on "shift bytes" behavior, represented by `PSLLDQ`/`PSRLDQ` on x86 and `VEXT` on Arm: https://github.com/penzn/SFMT-wasm/blob/master/SFMT-sse2.h#L37...