juj
juj
Thanks for all the discussion here! I filled out the spreadsheet on the SSE1 support page to add a new column on how those SSE1 instructions map to NEON. If...
For reference, here is the commit mentioned to above: https://github.com/kripken/emscripten/commit/8c8c7fd3ac716f20c21a8edee9e2010d672d76d5 . The `select(greaterThan(x, y), x, y)` set of instructions would directly map to ``` movaps mask, x cmpps mask, y,...
@andhow: `I think the important thing isn't so much % coverage of the instruction set but % coverage of real world use cases.` I find that statement a bit objectionable....
I've now worked on the quest to produce real world benchmarks to the extent that I think is useful at this point. First off, here are some places that were...
The NaN canonicalization probably is what causes this issue too: https://github.com/kripken/emscripten/issues/2840
ARM Neon has a hardware negate instruction for the 64-bit types `int8x8_t`, `int16x4_t`, `int32x2_t` and `float32x2_t`, and the 128-bit types `int8x16_t`, `int16x8_t`, `int32x4_t` and `float32x4_t`. I'd favor keeping the negate...
Talked with @bnjbvr really briefly at a conference recently, and given that a) we have a good guess what Float64x2 ops will look like, b) Firefox Nightly already implements it...
`One of my initial design goals for SIMD was to avoid requiring any optimization passes (pattern matching, auto vectorization) and make things as explicit as possible. I've had many people...
How about the following? ``` var mask = SIMD.Uint32x4.splat(0xFF); // Constant: create outside the hot loop. var input = SIMD.Uint8x16(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15); input = SIMD.Uint32x4.fromUint8x16Bits(input); // No-op var r = SIMD.Uint32x4.and(input, mask);...
In @PeterJensen's reverse operation, the swizzles assume that the inputs are in uint8 range, and all those swizzles of lane `1` assume that they will be receiving zeroes, or the...