binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Fix binary emitting of br_if with a refined value by emitting a cast

Open kripken opened this issue 10 months ago • 15 comments

Alternative to #6390: That PR makes br_if with a value have the same typing rules as the wasm spec, i.e., the potentially less refined type of the break target rather than the value itself. This PR, instead, works around that difference: we keep using the more refined type in Binaryen IR, and when we emit a binary we make sure to fix it up so it validates.

The rationale for using a somewhat hackish approach like that is that we hope the wasm spec will improve here, and so this is will only be temporary in that case. Even if not, this is useful because by using the most refined type in the IR we optimize in the best way possible, and only suffer when we emit fixups in the binary, but in practice those cases are very rare: br_if is almost always dropped rather than used, in real-world code (except for fuzz cases and exploits).

In more detail, the fixup we perform is to add a cast when we see that the br_if's value is used (not dropped) and it is not refined enough. The cast forces it to be the same type as the br_if's value, basically.

The binary reader will now skip redundant casts, that is, if it sees something cast from type T to type T then it ignores the cast. This has the benefit of undoing the extra casts that we add here, because in Binaryen IR the type is the type of the value, so those casts are in fact trivially unneeded. (This does make roundtripping less precise, but in the direction of better code, at least.) This is perhaps a minor matter, but it seems worth doing as without it repeated roundtrips will keep adding code, which can be very annoying.

This PR disallows DWARF + GC, for now. We already disallow DWARF + multivalue, because in DWARF mode we want to keep the locals in the original order, and GC is now another reason that may add new locals (we need locals in the case of a br_if with a tuple value). Given that the multivalue limitation here has never been an issue, hopefully this will not be either.

kripken avatar Apr 17 '24 23:04 kripken

The change to the binary reader will become obsolete if/when we start using IRBuilder in the binary parser. But we wouldn't want to make a similar change in IRBuilder itself, because then tests that purposefully create redundant casts would not parse as intended. Is there a way we can move the skipping of the redundant casts to the binary writer instead?

tlively avatar Apr 18 '24 00:04 tlively

This PR disallows DWARF + GC, for now. We already disallow DWARF + multivalue, because in DWARF mode we want to keep the locals in the original order, and GC is now another reason that may add new locals (we need locals in the case of a br_if with a tuple value). Given that the multivalue limitation here has never been an issue, hopefully this will not be either.

I'm considering emitting DWARF code for debugging, so that may become an issue for me at some point. I'm not using tuples in the generated code, so I guess one could relax the restriction at some point. Or I can skip wasm-opt in debugging mode.

vouillon avatar Apr 18 '24 11:04 vouillon

@tlively I was thinking we could have a mode in the IRBuilder that handles this, but I guess it would add complexity, and the benefit would be small (we'd be emitting the same binaries, in particular). Yes, I think it's not too hard to do this entirely on the writer side. I pushed that now, but see the downsides in test/lit/basic/reference-types.wast.

@vouillon Skipping wasm-opt in your debug mode might be fastest, yeah. But we could support GC+DWARF if there was a need for it.

Does DWARF give you any benefits over source maps, though? (Source maps are better supported, in general, both in Binaryen and elsewhere, and DWARF mainly has benefits for non-GC languages I believe.)

kripken avatar Apr 18 '24 17:04 kripken

I'm not really worried about that. I just wanted you to know about it. I think it is more urgent that binaryen generates valid code.

DWARF gives some control about where the execution can stop. I had a quick look at source maps, and single-stepping could be very confusing, sometimes jumping around in surprising ways in the source code. For instance, a pair (e1, e2) is compiled into an array:

;; pair
(array.new $block
   (ref.i31 (i32.const 0))
   ;; first element
   (...)
   ;; second element
   (...)
)

Then, when single-stepping, we first stop on the pair because of the tag (ref.i31 (i32.const 0)), then on each elements, then back on the pair with instruction array.new $block.With DWARF, one can express that it is no interesting to stop on expression (ref.i31 (i32.const 0)). I suppose disabling the simplify-locals passes would help (to simplify code generation, we actually first store the values of expressions e1 and e2 into intermediate local variables), but then we would keep a large number of variables which do not exist in the source code, which might be confusing as well.

DWARF also makes it possible to map variables in the source program to their values. I'm trying to get the Chrome C++ debugging extension: to support displaying JavaScript and Wasm GC values for that. This is not going to be enough to show variables whose value are stored in a Wasm GC structure (variables defined in an outer function for instance), but that should be good enough for local variables. And I could also push for a DWARF extension to access the fields of an WASM GC value.

vouillon avatar Apr 18 '24 20:04 vouillon

@vouillon

With DWARF, one can express that it is no interesting to stop on expression (ref.i31 (i32.const 0))

Can you do the same with source maps by not marking any source location for that ref.i31?

But yeah, for values in locals source maps aren't enough. One idea is to make the local names self-descriptive as a workaround. The same can be done with GC field names etc. That is, I hope people can get pretty far using source maps + the extended name section stuff.

kripken avatar Apr 18 '24 22:04 kripken

With DWARF, one can express that it is no interesting to stop on expression (ref.i31 (i32.const 0))

Can you do the same with source maps by not marking any source location for that ref.i31?

Wouldn't that be tricky because of how we automatically propagate debuginfo in the parsers?

tlively avatar Apr 19 '24 03:04 tlively

Wouldn't that be tricky because of how we automatically propagate debuginfo in the parsers?

Atm it might be an issue in the text parsers, but we fixed that in the binary parsers in https://github.com/WebAssembly/binaryen/pull/5906.

For inline source map annotations, perhaps we need a way to annotate an instruction as not having debug info, perhaps //@ with nothing else (compared to //@ file:line:col).

kripken avatar Apr 19 '24 03:04 kripken

I considered running a fixup pass here, it's a reasonable idea, but it is tricky for several reasons, in particular that we end up in weird situations with StackIR. In the binary writer we pick the BinaryIR or StackIR writer and then run that, but if the StackIR needs to rewrite the function with a pass then it must throw out StackIR. I experimented with that (somewhere in the long list of interim commits here :smile: I can find the specific ones if you want) but it became longer and messier than the current code here (and I didn't even get all the way to a full working solution).

kripken avatar Apr 23 '24 19:04 kripken

Ahh, that makes sense. Too bad. If we had replaced stack IR with Poppy IR, that wouldn't have been a problem.

tlively avatar Apr 23 '24 19:04 tlively

Hmm, even with PoppyIR we'd have had to make sure to only run passes that it supports. StackIR is similar in that it supports only certain operations. That makes me think that we could actually do it in reverse here, that is, if we scan and find a problem then we can convert to StackIR and have the fixup process be entirely in StackIR (that is, rather than falling back to BinaryenIR and fixing that up, we could promote to StackIR if not already, and fix up there). But even that would be more complex than just adding scratch locals...

kripken avatar Apr 23 '24 20:04 kripken

Sorry for dragging my feet on this. I keep hoping that I'll come up with a better solution. One problem is that this solution depends on the property that we can always cast to a subtype. I would like for that property to hold, but Andreas keeps insisting that we don't need it, so there's some risk that it may not hold forever. Another issue is that this solution needs to add scratch locals, but if we're adding scratch locals anyway, we can actually get away without the casts. The solution without casts would be to tee the sent values before the br_if, drop the values after the br_if, then local.get the values back out of the scratch locals after the drop. I think it's probably worth trying that out to see if it ends up being simpler than the cast solution.

tlively avatar Apr 26 '24 17:04 tlively

To make sure I follow you, when you suggest we experiment with local.tee instead, are you saying the StackIR path I suggested in my last comment?

(Otherwise, I did experiment with the BinaryenIR rewriting approach - I mentioned that here but re-reading it now I see I didn't explicitly mention local.tee so maybe I wasn't clear enough, sorry if so.)

kripken avatar Apr 29 '24 17:04 kripken

No, I mean a different code generation pattern that avoids emitting new casts entirely. Whether it's implemented in StackIR or not is a separate concern.

tlively avatar Apr 29 '24 17:04 tlively

As an example, the solution I have in mind would be translate this:

block $a (result anyref)
  i32.const 0
  ref.i31
  i32.const 1
  br_if $a
  ...

Into this:

block $a (result anyref)
  i32.const 0
  ref.i31
  local.tee $scratch-i31 ;; injected
  i32.const 1
  br_if $a
  drop                   ;; injected
  local.get $scratch-i31 ;; injected
  ...

tlively avatar Apr 29 '24 19:04 tlively

I have that implemented in br_if.alternative.2, but it requires more work and depends on a decision regarding #6568

kripken avatar May 07 '24 20:05 kripken

The preparation work in StackIR is done but meanwhile @tlively landed a nice refactoring of scratch locals, and I realized I can use that here :smile: Basically this PR was a bit complex as it added a specific new mechanism for scratch locals, while #6583 adds a general mechanism for multiple scratch locals of a particular type. As evidence of the nice design there, it was easy to use here.

So my inclination now is that this PR is the way to do. Separately, we can consider whether we want to refactor more binary writing logic into StackIR, but that is no longer needed or urgent.

Top comment has been updated.

kripken avatar May 14 '24 22:05 kripken

Did we definitely decide to go with the cast approach rather than the tee/drop/get approach?

tee/drop/get requires StackIR, I think - hard to do it otherwise. I do have a prototype of that working, but it requires a lot of other infrastructure changes to support it. So I think the cost of a cast here is not that bad - this is a pretty rare situation.

kripken avatar May 15 '24 18:05 kripken

I don't think it should require StackIR since it's just another drop-in replacement instruction sequence that only requires scratch locals. We should have all the necessary infrastructure in place. Besides the cost of the cast, I'm concerned that sooner or later we're going to end up with types that cannot be cast because rossberg keeps insisting that this would be reasonable and desirable. Would it be helpful if I put together a PR using that approach that we could compare against?

tlively avatar May 15 '24 18:05 tlively

Hmm, thanks, but let me try myself, should take just a few minutes.

I remember that one concern I had with doing it that way was that it could not integrate with the tuple.extract optimizations. But doing it in StackIR, which is earlier, allowed that. But maybe that's not important enough. Let me try it out here and see.

kripken avatar May 15 '24 18:05 kripken

Ok, I started to write the code and then remembered more: Avoiding casts means we need an extra scratch local for the condition. We can avoid constant increases in roundtrips by detecting the case of a local.get there, but we'd need to be careful of a local.get of a scratch local (edit: and StackIR stuff as in the other PR). Overall it is more complicated.

I think if we end up with types that cannot be cast in the future then we'd have no choice but to make that change, but for now it seems best not to, to me.

kripken avatar May 15 '24 18:05 kripken