design icon indicating copy to clipboard operation
design copied to clipboard

Smaller encoding for v128 scalar constants

Open kg opened this issue 2 years ago • 4 comments

From what I can see, the only way to create a v128 zero vector (for example to do unrolled memsets) is a full v128_const, weighing in at around 18 bytes (painful since my JIT is limited to 4kb). i64_const + splat would be one smaller encoding, but it doesn't look like v8 optimizes that, and it's probably unreasonable to expect it to get optimized. It might be good to try and add a v128_const_zero or v128_const_splat equivalent in any future iteration of SIMD, I can imagine this being even worse when wider vector types are added. Maybe this only needs to happen in the binary format, but it wouldn't be backwards compatible to change the existing opcode.

kg avatar May 19 '23 16:05 kg

Another trick would be to declare a v128 local that will be implicitly initialized to zero, then use a local.get to retrieve the value. I don't know how the V8 codegen for that compares to the other options, though.

tlively avatar May 19 '23 16:05 tlively

Yeah, I was thinking of experimenting with a local, but from looking at the code I'm not sure v8 / spidermonkey etc will realize that it's constant. Turning the consts into memory loads would probably be pretty bad. I got measurable speedups by switching my i64.const 0 + splat to v128.const 0, it just means I can't JIT as much code now due to the size bloat. I'll definitely test it at some point.

kg avatar May 19 '23 17:05 kg

For Ion in SpiderMonkey, we'll generate equivalent IR for v128.constant 0 and using local.get when the local is zero-initialized once (by default or explicitly). Baseline will use a memory load from the stack for the local.get though.

eqrion avatar May 19 '23 17:05 eqrion

I think we should be able to generate better code for i64.const 0 + splat in V8, because we should be able to constant match the input to zero, and generate a pxor, IIRC this is what we currently generate for V128Const 0. I've filed this tracking bug to optimize this better. Orthogonally, I'm not opposed to adding a V128Const for all zeros/ones.

dtig avatar May 19 '23 17:05 dtig