juj comments

Results 466 comments of

juj

How will native code port on top of JS-SIMD?

Thanks for all the discussion here! I filled out the spreadsheet on the SSE1 support page to add a new column on how those SSE1 instructions map to NEON. If...

How will native code port on top of JS-SIMD?

For reference, here is the commit mentioned to above: https://github.com/kripken/emscripten/commit/8c8c7fd3ac716f20c21a8edee9e2010d672d76d5 . The `select(greaterThan(x, y), x, y)` set of instructions would directly map to ``` movaps mask, x cmpps mask, y,...

How will native code port on top of JS-SIMD?

@andhow: `I think the important thing isn't so much % coverage of the instruction set but % coverage of real world use cases.` I find that statement a bit objectionable....

How will native code port on top of JS-SIMD?

I've now worked on the quest to produce real world benchmarks to the extent that I think is useful at this point. First off, here are some places that were...

NaN canonicalization of float32x4 lane accesses

The NaN canonicalization probably is what causes this issue too: https://github.com/kripken/emscripten/issues/2840

Unsigned types don't have "neg"

ARM Neon has a hardware negate instruction for the 64-bit types `int8x8_t`, `int16x4_t`, `int32x2_t` and `float32x2_t`, and the 128-bit types `int8x16_t`, `int16x8_t`, `int32x4_t` and `float32x4_t`. I'd favor keeping the negate...

SSE2 Float64x2 support?

Talked with @bnjbvr really briefly at a conference recently, and given that a) we have a good guess what Float64x2 ops will look like, b) Firefox Nightly already implements it...

SIMD.int8x16.sumOfAbsoluteDifferences on ARM

`One of my initial design goals for SIMD was to avoid requiring any optimization passes (pattern matching, auto vectorization) and make things as explicit as possible. I've had many people...

SIMD uint8x16 to 4 of uint32x4 with transpose?

How about the following? ``` var mask = SIMD.Uint32x4.splat(0xFF); // Constant: create outside the hot loop. var input = SIMD.Uint8x16(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15); input = SIMD.Uint32x4.fromUint8x16Bits(input); // No-op var r = SIMD.Uint32x4.and(input, mask);...

SIMD uint8x16 to 4 of uint32x4 with transpose?

In @PeterJensen's reverse operation, the swizzles assume that the inputs are in uint8 range, and all those swizzles of lane `1` assume that they will be receiving zeroes, or the...