simd Inefficient x64 codegen for integer comparisons

In both cranelift and v8, unsigned integer comparison are lowered to more than 1instruction:

unsigned greater/less-than takes 4 instructions; e.g. cranelift and v8
both unsigned and signed greater/less-than-or-equal take 2 instructions; e.g. cranelift and v8

These seem like high-use instructions and I wonder if there is any good way to get around this inefficiency.

Feb 06 '20 21:02 abrown

For the unsigned greater/less-than in V8, we have an extra pcmpeqd to synthesize all ones, which is something we could get rid of with future optimizations because pxor can take a memory operand. For the greater/less-than-or-equal cases given that there is no one instruction, I think the two instruction sequence is possibly the best option. I doubt this is actionable, will leave it open for now to see if others have opinions about this.

Feb 18 '20 22:02 dtig

To add to the list, compare for inequality takes 3 instructions, but compare for equality takes 1.

I've ran into this in context of LLVM strength-reducing one to another: it will replace i32x4.gt(value, 0) with i32x4.ne(value, 0) if it knows value is non-negative, which has a slight penalty on the codegen.

I agree that short of a slightly more efficient "not" by using a memory operand (which removes 1 instruction in both cases) there doesn't seem to be anything else that could be done here.

Feb 23 '20 17:02 zeux