Inefficient x64 codegen for integer comparisons
In both cranelift and v8, unsigned integer comparison are lowered to more than 1instruction:
- unsigned greater/less-than takes 4 instructions; e.g. cranelift and v8
- both unsigned and signed greater/less-than-or-equal take 2 instructions; e.g. cranelift and v8
These seem like high-use instructions and I wonder if there is any good way to get around this inefficiency.
For the unsigned greater/less-than in V8, we have an extra pcmpeqd to synthesize all ones, which is something we could get rid of with future optimizations because pxor can take a memory operand. For the greater/less-than-or-equal cases given that there is no one instruction, I think the two instruction sequence is possibly the best option. I doubt this is actionable, will leave it open for now to see if others have opinions about this.
To add to the list, compare for inequality takes 3 instructions, but compare for equality takes 1.
I've ran into this in context of LLVM strength-reducing one to another: it will replace i32x4.gt(value, 0) with i32x4.ne(value, 0) if it knows value is non-negative, which has a slight penalty on the codegen.
I agree that short of a slightly more efficient "not" by using a memory operand (which removes 1 instruction in both cases) there doesn't seem to be anything else that could be done here.