wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Remove binary_imm64 and int_compare_imm instructions

Open jameysharp opened this issue 3 years ago • 12 comments

I have no idea yet whether this is a good change, and it definitely should not be merged as-is. The main issues are that there are a bunch of tests I haven't fixed up, and some optimizations I've removed. Also I have no performance measurements yet to see what effect this has on either compile time or generated code quality.

I just wanted to understand what the *_imm instructions were used for, and trying to grep for them didn't work out well. So I removed them to let the compiler tell me where they're used instead.

To avoid having to make as many changes, I introduced a InstImmBuilder trait. It provides methods like iadd_imm which emit the pair of instructions that legalization would have decomposed that instruction into eventually anyway.

I have several questions that this may help with answering:

  • Is there significant savings, in IR code size and/or compile time, from inlining one immediate operand into some instructions at the frontend? Or is the widespread frontend use of these instructions just for convenience?

  • Can simple_preopt cleanly identify arithmetic identities involving constants without rewriting to use these combined forms?

  • Is there a compile-time cost to simple_preopt rewriting into combined forms, only for legalization to rewrite them away again soon after?

jameysharp avatar Aug 16 '22 17:08 jameysharp

cc https://github.com/bytecodealliance/wasmtime/issues/3250

One of my arguments against this is that it makes the clif ir harder to read and harder to generate.

bjorn3 avatar Aug 16 '22 17:08 bjorn3

Hah, I guess I accidentally implemented what @cfallin proposed just by being confused. Thanks for digging that link up! It's helpful context.

I noticed a couple things that I think are existing bugs while exploring this:

  • The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

  • I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

jameysharp avatar Aug 16 '22 17:08 jameysharp

Wow, there is a lot more stuff attached to _imm than i expected!

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

For example:

v0 = iconst.i16 0xABCD
v1 = ...
v2 = ...
v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

It would also help in cases other than *_imm. But probably does not cover all cases.

afonso360 avatar Aug 16 '22 17:08 afonso360

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure

I believe that is exactly what will happen with the egraph PR if it gets merged. https://github.com/bytecodealliance/wasmtime/pull/4249/files#diff-a596f1a407284b0b3591c17053fe0eff7f0229b70a2fcd9282fc2aedf9cd38a4

bjorn3 avatar Aug 16 '22 17:08 bjorn3

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

I like that! It would resolve my readability concerns. It doesn't resolve my clif ir generation concern though.

bjorn3 avatar Aug 16 '22 17:08 bjorn3

It doesn't resolve my clif ir generation concern though.

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift. It that what you were referring to or did I miss something?

Although I do think we should note in the docs that those helpers are not actual instructions by themselves.

afonso360 avatar Aug 16 '22 17:08 afonso360

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift.

Right, missed that.

bjorn3 avatar Aug 16 '22 17:08 bjorn3

@jameysharp thanks for exploring here. As others have pointed out, there's a lot of prior thought on this, and overall I think we do want to remove these variants in due time.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

Indeed, that's one of the main effects of my mid-end optimizer work; it will replace simple_preopt just as you are suggesting.

Now, regarding iadd vs iadd_imm and friends:

I think that breaking the ops into smaller pieces, and keeping iadd separate from its imm, makes the most sense from an optimization-rules perspective: we otherwise need rules to match both. (For example, multiply-add needs to also match multiply-add-imm.) Likewise it's useful to keep iconst separate because we want to do special things with it sometimes, like rematerialization and/or use of constant pools.

Part of the reason for the original _imm forms, aside from the code-size advantage that @sunfishcode mentions in #3250, was that old Cranelift rewrote CLIF until each op had a direct machine-instruction equivalent; so iadd_imm was useful to represent the reg-immediate form of the x86 add instruction. We no longer have that constraint, though.

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

cfallin avatar Aug 16 '22 19:08 cfallin

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability. ...

v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

This turned out to be really easy to do. I've opened #4725 for that.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

Right. So these rules in x64's lower.isle can never match, yeah?

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

jameysharp avatar Aug 16 '22 20:08 jameysharp

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

cfallin avatar Aug 16 '22 20:08 cfallin

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

cfallin avatar Aug 16 '22 20:08 cfallin

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

Great, I've opened #4726 for deleting those.

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

Okay, I can see that. It's not obvious to me what code that should generate, though, because the current InstBuilder traits assume each method will build exactly one instruction, consuming the builder. These _imm variants, being a kind of macro for multiple instructions, are fundamentally different.

But we can sort that out when it's time to re-visit this after your e-graph work lands.

jameysharp avatar Aug 16 '22 21:08 jameysharp