simd issues

LLVM inlines constant generation for swizzle

1

``` #include #define SIMDPP_ARCH_X86_SSE4_1 1 #include "simdpp/simd.h" #include #include #include #include #include #include #include #include // SWIZZLE constants const static __u32x4 sw1 __attribute__((require_constant_initialization)) = {0xffffff00, 0xffffff01, 0xffffff02, 0xffffff03}; const static...

omnisip

toolchain

Consider adding Horizontal Add

33

Packed horizontal arithmetic is reasonably performant on SSE3+ and Neon. These would be useful for complex multiplications, and in the absence of the opcodes below, these would need to be...

dtig

post SIMD MVP

v128.const and v128.shuffle blow up code size

5

`v128.const` and `v128.shuffle` instructions are always 18 bytes long, which is excessive in the majority of cases. For comparison, native shuffle instructions on x86-64 are at most five bytes long,...

Maratyszcza

Documenting performance tradeoffs

6

I've taken a stab at documenting the performance tradeoffs between various instructions and collected the information in this repository: https://github.com/zeux/wasm-simd Obviously this is far from a normative reference, and is...

zeux

perf documentation

Explicit rounding control per operation

3

Hi, add, sub, div, sqrt, mul, and float->integer conversions, should all have a third optional argument, to allow changing a rounding mode (nearest, toward plus infinity, towards minus infinity, to...

baryluk

post SIMD MVP

Support register-tight use cases

1

This is open-ended. The problem is that many key use cases, such as matrix multiplication kernels, need to know a number of SIMD vector registers that they can count on...

bjacob

post SIMD MVP

Support multiplication of a vector against one lane (broadcasted) of another vector

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This...

bjacob

Consistently support widening/long variants of integer instructions

In SIMD instruction sets where integer arithmetic is really first-class, support for different bit width in different operands is not just an ad-hoc addition for a few instructions. Instead, generically...

bjacob

Integer multiply-add instructions

Unlike the float case where the fused-vs-unfused issue creates complications (PR #79) in the integer case there is no downside to using single-instruction multiply-add. These are vital to getting above...

bjacob

Packed shift

2

I looked into single-precision Mersenne Twister after @ngzhian did double-precision port, and its core function relies on "shift bytes" behavior, represented by `PSLLDQ`/`PSRLDQ` on x86 and `VEXT` on Arm: https://github.com/penzn/SFMT-wasm/blob/master/SFMT-sse2.h#L37...

penzn

simd
simd copied to clipboard

Metadata

LLVM inlines constant generation for swizzle

Consider adding Horizontal Add

v128.const and v128.shuffle blow up code size

Documenting performance tradeoffs

Explicit rounding control per operation

Support register-tight use cases

Support multiplication of a vector against one lane (broadcasted) of another vector

Consistently support widening/long variants of integer instructions

Integer multiply-add instructions

Packed shift

← Metadata

Owner

Metadata

simd simd copied to clipboard

Metadata

← Metadata

Owner

Metadata

simd
simd copied to clipboard