wasmtime
wasmtime copied to clipboard
[x64] Coalesce loads/stores when paired with an insert/extract lane
The new Wasm SIMD instructions load[8|16|32|64]_lane and store[8|16|32|64]_lane were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:
load + insertlanebecomes a singlePINSR*extractlane + storebecomes a singlePEXTR*
This change adds CLIF tests that should pass once the necessary pattern-matching issues are fixed.
Now that the x64 backend is migrated to ISLE, is it time to re-visit this optimization? cc: @elliottt
This is only allowed for aligned pointers, right? Can you check the aligned memflag?
@abrown are you interested in pursuing this further? (Going through old PRs and cleaning up...) I agree with @bjorn3 that the alignment issue is the critical question here, and so I suspect there won't be major opportunity coming from Wasm-SIMD (given that loads/stores only have alignment hints, not hard-enforced requirements), but we can still think about it further if there's some other aspect where it could help...
I had to refresh my mental cache for this issue quite a bit (it's been a while for this issue!). I don't know why I didn't originally respond, but as I dug into this, I didn't immediately find any requirement for these instructions to use aligned addresses. [searches more...] In fact, I do see the following in section 12.10.7 of the Intel manuals:
SSE4.1 adds 7 instructions (corresponding to 9 assembly instruction mnemonics) that simplify data insertion and extraction between general-purpose register (GPR) and XMM registers (EXTRACTPS, INSERTPS, PINSRB, PINSRD, PINSRQ, PEXTRB, PEXTRW, PEXTRD, and PEXTRQ). When accessing memory, no alignment is required for any of these instructions (unless alignment checking is enabled).
I think we could proceed with adding these tests?