Dan Weber

Results 59 comments of Dan Weber

Is this a labeling issue? I've updated this to use the word 'move' instead of 'load'. I didn't see any other terminology in there that matched this specific case. If...

@penzn This isn't the same as replace_lane. Replace lane replaces one value and returns the updated vector. This initializes a vector from a scalar or another vector zeroing the upper...

@Maratyszcza Thanks for the feedback. If anyone can help with the ARMv7 with Neon intrinsics I'll generate the PR today. @penzn Your question has two parts and is very interesting....

https://www.felixcloutier.com/x86/movq On Tue, Oct 6, 2020, 23:28 Marat Dukhan wrote: > VMOVD xmm, xmm and VMOVQ xmm, xmm forms don't exist. You could use > [V]MOVSS and [V]MOVSD, but they...

@Maratyszcza Check these two out https://uops.info/table.html?search=Movq&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_measurements=on&cb_iaca30=on&cb_doc=on&cb_base=on&cb_avx=on https://uops.info/table.html?search=Pand&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_measurements=on&cb_iaca30=on&cb_doc=on&cb_base=on&cb_avx=on See how vmovq and pand both use port 015? Think they're synonyms for the same op?

This proposal was originally put together for completeness along with the load32/load64_zero. Its functionality is equivalent to `and 0xffffffffffffff` for the lower 64 bits with movq and `and 0xffffffff` for...

As a general rule, I'm in favor of real CSE optimization. Real CSE optimization would allow us to perform accurate cost models on both WSOs as well as the underlying...

@lemaitre This could be a really good use case. Do you have a pre-built benchmark we can use? I'd love to see the wasm code generation and the native assembly...

Possibly. The goal here is to find cases where there is a performance penalty for not being able to reuse constants. In my case, I have one where there is...

> SSSE3 lowering mismatch others. `PMADDUBSW` does unsigned by signed multiplication. You're totally right. Nice catch.