simd
simd copied to clipboard
Branch of the spec repo scoped to discussion of SIMD in WebAssembly
Hi everyone, @ngzhian has encouraged me to share the [discussion we're having on building an x64 constant pool and related optimizations in V8](https://groups.google.com/u/3/g/v8-dev/c/QJfpvc55Hfg/m/A7ZEuASOBQAJ) since it might be helpful to other...
[`bitselect`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bitwise-select) is a 3-instruction lowering in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L447-L458) and a 4-instruction lowering in [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3558-L3566).
[`splat`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#create-vector-with-identical-lanes) has 2- to 3-instruction lowerings in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L348-L406) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3095-L3105). I believe the "splat all ones" and "splat all zeroes" cases are a single-instruction lowering in both platforms but it...
Certain [SIMD conversions](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#conversions) seem to have inefficient lowerings in x64. `f32x4.convert_i32x4_u` is lowered to 8 instruction by [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L2448-L2464). The signed version, `f32x4.convert_i32x4_s`, on the other hand, is lowered to a...
[`all_true`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#all-lanes-true) checks if all lanes are (unsigned) greater than 0. This requires 4 instructions in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L471-L501) and 6 in [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L590-L602). Perhaps there is a more granular way to reduce lanes...
In both cranelift and v8, unsigned integer comparison are lowered to more than 1instruction: - unsigned greater/less-than takes 4 instructions; e.g. [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L525-L532) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3071-L3081) - both unsigned and signed greater/less-than-or-equal...
In both [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L575-L607) (ignore the bitcasts) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L2465-L2491), floating-point absolute value and floating-point negation are 3-instruction lowerings. I don't believe there is any better lowering than these (is there?) for...
While attempting to lower `shl` and `shr` (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bit-shifts) in cranelift, I observed that following instructions would involve a non-optimal lowering to x86: - `i8x16.shl` - `i8x16.shr_s` - `i8x16.shr_u` - `i64x2.shr_s`...
Suggestion from https://github.com/WebAssembly/simd/pull/455#issuecomment-775787091. We can try doing this after adding all the instructions to syntax.
Sign-replication is an often-used operation that replicates the sign bit of a SIMD lane into all bits of the lane. There are two reasons why we need to pay attention...