simd issues

Constant Pool Optimization on X64 and Related Benchmarks

4

Hi everyone, @ngzhian has encouraged me to share the [discussion we're having on building an x64 constant pool and related optimizations in V8](https://groups.google.com/u/3/g/v8-dev/c/QJfpvc55Hfg/m/A7ZEuASOBQAJ) since it might be helpful to other...

omnisip

Inefficient x64 codegen for bitselect

25

[`bitselect`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bitwise-select) is a 3-instruction lowering in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L447-L458) and a 4-instruction lowering in [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3558-L3566).

abrown

perf documentation

Inefficient x64 codegen for splat

7

[`splat`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#create-vector-with-identical-lanes) has 2- to 3-instruction lowerings in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L348-L406) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3095-L3105). I believe the "splat all ones" and "splat all zeroes" cases are a single-instruction lowering in both platforms but it...

abrown

perf documentation

Inefficient x64 codegen for conversion instructions

13

Certain [SIMD conversions](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#conversions) seem to have inefficient lowerings in x64. `f32x4.convert_i32x4_u` is lowered to 8 instruction by [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L2448-L2464). The signed version, `f32x4.convert_i32x4_s`, on the other hand, is lowered to a...

abrown

perf documentation

Inefficient x64 codegen for all_true/any_true

12

[`all_true`](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#all-lanes-true) checks if all lanes are (unsigned) greater than 0. This requires 4 instructions in [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L471-L501) and 6 in [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L590-L602). Perhaps there is a more granular way to reduce lanes...

abrown

perf documentation

Inefficient x64 codegen for integer comparisons

2

In both cranelift and v8, unsigned integer comparison are lowered to more than 1instruction: - unsigned greater/less-than takes 4 instructions; e.g. [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L525-L532) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L3071-L3081) - both unsigned and signed greater/less-than-or-equal...

abrown

perf documentation

Inefficient x64 codegen for fabs/fneg

5

In both [cranelift](https://github.com/bytecodealliance/cranelift/blob/48029b4a16264672ce24afbee1050b37e1e68020/cranelift-codegen/meta/src/isa/x86/legalize.rs#L575-L607) (ignore the bitcasts) and [v8](https://github.com/v8/v8/blob/19be4913881bb02c5d9b4f1c7547ee2d1273120b/src/compiler/backend/x64/code-generator-x64.cc#L2465-L2491), floating-point absolute value and floating-point negation are 3-instruction lowerings. I don't believe there is any better lowering than these (is there?) for...

abrown

perf documentation

Inefficient x64 codegen for 8x16 shifts

27

While attempting to lower `shl` and `shr` (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bit-shifts) in cranelift, I observed that following instructions would involve a non-optimal lowering to x86: - `i8x16.shl` - `i8x16.shr_s` - `i8x16.shr_u` - `i64x2.shr_s`...

abrown

perf documentation

Consider listing exceptions to grammar to deal with irregularity in SIMD instructions

Suggestion from https://github.com/WebAssembly/simd/pull/455#issuecomment-775787091. We can try doing this after adding all the instructions to syntax.

ngzhian

spectext

Canonical sign-replication operation

6

Sign-replication is an often-used operation that replicates the sign bit of a SIMD lane into all bits of the lane. There are two reasons why we need to pay attention...

Maratyszcza

perf documentation

simd
simd copied to clipboard

Metadata

Constant Pool Optimization on X64 and Related Benchmarks

Inefficient x64 codegen for bitselect

Inefficient x64 codegen for splat

Inefficient x64 codegen for conversion instructions

Inefficient x64 codegen for all_true/any_true

Inefficient x64 codegen for integer comparisons

Inefficient x64 codegen for fabs/fneg

Inefficient x64 codegen for 8x16 shifts

Consider listing exceptions to grammar to deal with irregularity in SIMD instructions

Canonical sign-replication operation

← Metadata

Owner

Metadata

simd simd copied to clipboard

Metadata

← Metadata

Owner

Metadata

simd
simd copied to clipboard