wasmtime
wasmtime copied to clipboard
cranelift: Sign extend immediates in instructions that embed them.
👋 Hey,
This PR alters the behaviour of *_imm
instructions to sign extend their immediate argument when the control type is i128
.
This comes from a fuzzer issue where the interpreter was sign extending the immediates but the legalizations were not.
Fixes: #4568 Fixes: #4641
As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.
As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.
You are right, I didn't read your comment properly (sorry!), ill mark this as draft until we make that decision.
@akirilov-arm I've updated the documentation, is this all you meant, or did I miss something else?
Also, cc @cfallin since I can't request reviews.
I also think zero extending the unsigned operations is probably the better choice.
However, what if we instead altered the type of the instruction to accept the full Imm128
?
I don't think this would grow the size of InstructionData
since we already have Shuffle
which also stores a Imm128
. However Shuffle
stores the immediate in the DataFlowGraph
, we could do the same, and in this legalization pass emit the appropriate iconst
/ iconcat
/ sextend
depending on what the constant is.
Would storing the Imm in the DFG for these operations cause performance issues?
Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.
However we can't carry that inline in the instruction, as you're hinting, because InstructionData
is 16 bytes total (and we want to keep it that way).
I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).
I had another thought, though, that may be more "honest" about the costs: what if we limited (i) op_imm
instructions and (ii) the iconst
instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce two iconst
s and iconcat
them. In some sense, trying to allow iconst.i128 -1
or the op_imm
folded variant is lulling producers into a false sense of what's supported. As soon as one writes iconst.i128 -0x8000_0000_0000_0001
one finds out that there are only half as many bits in the immediate field as the type otherwise implies.
Thoughts on that?
I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).
I was proposing that only for *_imm
variants. The way I see it, these instructions are already just sort of "helpers" in the sense that they just construct a pattern that we recognize (i.e. iadd
+ iconst
), they could just emit more advanced patterns to support i128 (i.e. iadd+iconst+sext
or iadd+iconst+iconcat
all in 64bits).
Having more operations only work up to i64
to me feels like i128
is a second class citizen, which is sort of true for the current backends, but I don't really like it at the lang level.
I don't know, I'm a bit undecided on this, I'd like to hear what other people think.
Generally I agree at an IR-design level: it would be nice to have orthogonality here, and allow all types up through i128
in all places where integers are allowed.
This compromise feels similar to "small immediates" in RISC ISAs though: we have a relatively small struct size for instructions, and its size is critical for performance, and its entire width would be taken by just the i128
immediate (hence it would need to grow). So in that sense I don't think we can have an inline Imm128
.
I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.
I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.
Me too, and I think we could mitigate some of the cost (if it is non-negligible) by having two const InstructionData
variants, one inline up to 64bits and one in the DFG for larger values, and choose between them when the user emits a iconst
(can we do this?).
Although right now I don't have a lot of time to try this out.
I'm planning on addressing the feedback above and getting this merged, is that okay, or would you prefer restricting the types that we accept?