wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

[x64] Coalesce loads/stores when paired with an insert/extract lane

Open abrown opened this issue 4 years ago • 4 comments
trafficstars

The new Wasm SIMD instructions load[8|16|32|64]_lane and store[8|16|32|64]_lane were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:

  • load + insertlane becomes a single PINSR*
  • extractlane + store becomes a single PEXTR*

This change adds CLIF tests that should pass once the necessary pattern-matching issues are fixed.

abrown avatar Mar 09 '21 23:03 abrown

Now that the x64 backend is migrated to ISLE, is it time to re-visit this optimization? cc: @elliottt

jameysharp avatar Sep 01 '22 01:09 jameysharp

This is only allowed for aligned pointers, right? Can you check the aligned memflag?

bjorn3 avatar Sep 01 '22 07:09 bjorn3

@abrown are you interested in pursuing this further? (Going through old PRs and cleaning up...) I agree with @bjorn3 that the alignment issue is the critical question here, and so I suspect there won't be major opportunity coming from Wasm-SIMD (given that loads/stores only have alignment hints, not hard-enforced requirements), but we can still think about it further if there's some other aspect where it could help...

cfallin avatar Feb 09 '23 00:02 cfallin

I had to refresh my mental cache for this issue quite a bit (it's been a while for this issue!). I don't know why I didn't originally respond, but as I dug into this, I didn't immediately find any requirement for these instructions to use aligned addresses. [searches more...] In fact, I do see the following in section 12.10.7 of the Intel manuals:

SSE4.1 adds 7 instructions (corresponding to 9 assembly instruction mnemonics) that simplify data insertion and extraction between general-purpose register (GPR) and XMM registers (EXTRACTPS, INSERTPS, PINSRB, PINSRD, PINSRQ, PEXTRB, PEXTRW, PEXTRD, and PEXTRQ). When accessing memory, no alignment is required for any of these instructions (unless alignment checking is enabled).

I think we could proceed with adding these tests?

abrown avatar Feb 10 '23 18:02 abrown