binaryen
binaryen copied to clipboard
Fix binary emitting of br_if with a refined value by emitting a cast
Alternative to #6390: That PR makes br_if
with a value have the same typing rules
as the wasm spec, i.e., the potentially less refined type of the break target rather than
the value itself. This PR, instead, works around that difference: we keep using the more
refined type in Binaryen IR, and when we emit a binary we make sure to fix it up so it
validates.
The rationale for using a somewhat hackish approach like that is that we hope the wasm
spec will improve here, and so this is will only be temporary in that case. Even if not,
this is useful because by using the most refined type in the IR we optimize in the best
way possible, and only suffer when we emit fixups in the binary, but in practice those
cases are very rare: br_if
is almost always dropped rather than used, in real-world
code (except for fuzz cases and exploits).
In more detail, the fixup we perform is to add a cast when we see that the br_if
's value
is used (not dropped) and it is not refined enough. The cast forces it to be the same
type as the br_if
's value, basically.
The binary reader will now skip redundant casts, that is, if it sees something cast from type T to type T then it ignores the cast. This has the benefit of undoing the extra casts that we add here, because in Binaryen IR the type is the type of the value, so those casts are in fact trivially unneeded. (This does make roundtripping less precise, but in the direction of better code, at least.) This is perhaps a minor matter, but it seems worth doing as without it repeated roundtrips will keep adding code, which can be very annoying.
This PR disallows DWARF + GC, for now. We already disallow DWARF + multivalue, because in DWARF mode we want to keep the locals in the original order, and GC is now another reason that may add new locals (we need locals in the case of a br_if with a tuple value). Given that the multivalue limitation here has never been an issue, hopefully this will not be either.
The change to the binary reader will become obsolete if/when we start using IRBuilder in the binary parser. But we wouldn't want to make a similar change in IRBuilder itself, because then tests that purposefully create redundant casts would not parse as intended. Is there a way we can move the skipping of the redundant casts to the binary writer instead?
This PR disallows DWARF + GC, for now. We already disallow DWARF + multivalue, because in DWARF mode we want to keep the locals in the original order, and GC is now another reason that may add new locals (we need locals in the case of a br_if with a tuple value). Given that the multivalue limitation here has never been an issue, hopefully this will not be either.
I'm considering emitting DWARF code for debugging, so that may become an issue for me at some point. I'm not using tuples in the generated code, so I guess one could relax the restriction at some point. Or I can skip wasm-opt
in debugging mode.
@tlively I was thinking we could have a mode in the IRBuilder that handles this, but I guess it would add complexity, and the benefit would be small (we'd be emitting the same binaries, in particular). Yes, I think it's not too hard to do this entirely on the writer side. I pushed that now, but see the downsides in test/lit/basic/reference-types.wast
.
@vouillon Skipping wasm-opt
in your debug mode might be fastest, yeah. But we could support GC+DWARF if there was a need for it.
Does DWARF give you any benefits over source maps, though? (Source maps are better supported, in general, both in Binaryen and elsewhere, and DWARF mainly has benefits for non-GC languages I believe.)
I'm not really worried about that. I just wanted you to know about it. I think it is more urgent that binaryen generates valid code.
DWARF gives some control about where the execution can stop. I had a quick look at source maps, and single-stepping could be very confusing, sometimes jumping around in surprising ways in the source code. For instance, a pair (e1, e2)
is compiled into an array:
;; pair
(array.new $block
(ref.i31 (i32.const 0))
;; first element
(...)
;; second element
(...)
)
Then, when single-stepping, we first stop on the pair because of the tag (ref.i31 (i32.const 0))
, then on each elements, then back on the pair with instruction array.new $block
.With DWARF, one can express that it is no interesting to stop on expression (ref.i31 (i32.const 0))
. I suppose disabling the simplify-locals
passes would help (to simplify code generation, we actually first store the values of expressions e1
and e2
into intermediate local variables), but then we would keep a large number of variables which do not exist in the source code, which might be confusing as well.
DWARF also makes it possible to map variables in the source program to their values. I'm trying to get the Chrome C++ debugging extension: to support displaying JavaScript and Wasm GC values for that. This is not going to be enough to show variables whose value are stored in a Wasm GC structure (variables defined in an outer function for instance), but that should be good enough for local variables. And I could also push for a DWARF extension to access the fields of an WASM GC value.
@vouillon
With DWARF, one can express that it is no interesting to stop on expression
(ref.i31 (i32.const 0))
Can you do the same with source maps by not marking any source location for that ref.i31
?
But yeah, for values in locals source maps aren't enough. One idea is to make the local names self-descriptive as a workaround. The same can be done with GC field names etc. That is, I hope people can get pretty far using source maps + the extended name section stuff.
With DWARF, one can express that it is no interesting to stop on expression
(ref.i31 (i32.const 0))
Can you do the same with source maps by not marking any source location for that
ref.i31
?
Wouldn't that be tricky because of how we automatically propagate debuginfo in the parsers?
Wouldn't that be tricky because of how we automatically propagate debuginfo in the parsers?
Atm it might be an issue in the text parsers, but we fixed that in the binary parsers in https://github.com/WebAssembly/binaryen/pull/5906.
For inline source map annotations, perhaps we need a way to annotate an instruction as not having debug info, perhaps //@
with nothing else (compared to //@ file:line:col
).
I considered running a fixup pass here, it's a reasonable idea, but it is tricky for several reasons, in particular that we end up in weird situations with StackIR. In the binary writer we pick the BinaryIR or StackIR writer and then run that, but if the StackIR needs to rewrite the function with a pass then it must throw out StackIR. I experimented with that (somewhere in the long list of interim commits here :smile: I can find the specific ones if you want) but it became longer and messier than the current code here (and I didn't even get all the way to a full working solution).
Ahh, that makes sense. Too bad. If we had replaced stack IR with Poppy IR, that wouldn't have been a problem.
Hmm, even with PoppyIR we'd have had to make sure to only run passes that it supports. StackIR is similar in that it supports only certain operations. That makes me think that we could actually do it in reverse here, that is, if we scan and find a problem then we can convert to StackIR and have the fixup process be entirely in StackIR (that is, rather than falling back to BinaryenIR and fixing that up, we could promote to StackIR if not already, and fix up there). But even that would be more complex than just adding scratch locals...
Sorry for dragging my feet on this. I keep hoping that I'll come up with a better solution. One problem is that this solution depends on the property that we can always cast to a subtype. I would like for that property to hold, but Andreas keeps insisting that we don't need it, so there's some risk that it may not hold forever. Another issue is that this solution needs to add scratch locals, but if we're adding scratch locals anyway, we can actually get away without the casts. The solution without casts would be to tee the sent values before the br_if
, drop the values after the br_if
, then local.get
the values back out of the scratch locals after the drop. I think it's probably worth trying that out to see if it ends up being simpler than the cast solution.
To make sure I follow you, when you suggest we experiment with local.tee
instead, are you saying the StackIR path I suggested in my last comment?
(Otherwise, I did experiment with the BinaryenIR rewriting approach - I mentioned that here but re-reading it now I see I didn't explicitly mention local.tee
so maybe I wasn't clear enough, sorry if so.)
No, I mean a different code generation pattern that avoids emitting new casts entirely. Whether it's implemented in StackIR or not is a separate concern.
As an example, the solution I have in mind would be translate this:
block $a (result anyref)
i32.const 0
ref.i31
i32.const 1
br_if $a
...
Into this:
block $a (result anyref)
i32.const 0
ref.i31
local.tee $scratch-i31 ;; injected
i32.const 1
br_if $a
drop ;; injected
local.get $scratch-i31 ;; injected
...
I have that implemented in br_if.alternative.2
, but it requires more work and depends on a decision regarding #6568
The preparation work in StackIR is done but meanwhile @tlively landed a nice refactoring of scratch locals, and I realized I can use that here :smile: Basically this PR was a bit complex as it added a specific new mechanism for scratch locals, while #6583 adds a general mechanism for multiple scratch locals of a particular type. As evidence of the nice design there, it was easy to use here.
So my inclination now is that this PR is the way to do. Separately, we can consider whether we want to refactor more binary writing logic into StackIR, but that is no longer needed or urgent.
Top comment has been updated.
Did we definitely decide to go with the cast approach rather than the tee/drop/get approach?
tee/drop/get
requires StackIR, I think - hard to do it otherwise. I do have a prototype of that working, but it requires a lot of other infrastructure changes to support it. So I think the cost of a cast here is not that bad - this is a pretty rare situation.
I don't think it should require StackIR since it's just another drop-in replacement instruction sequence that only requires scratch locals. We should have all the necessary infrastructure in place. Besides the cost of the cast, I'm concerned that sooner or later we're going to end up with types that cannot be cast because rossberg keeps insisting that this would be reasonable and desirable. Would it be helpful if I put together a PR using that approach that we could compare against?
Hmm, thanks, but let me try myself, should take just a few minutes.
I remember that one concern I had with doing it that way was that it could not integrate with the tuple.extract
optimizations. But doing it in StackIR, which is earlier, allowed that. But maybe that's not important enough. Let me try it out here and see.
Ok, I started to write the code and then remembered more: Avoiding casts means we need an extra scratch local for the condition. We can avoid constant increases in roundtrips by detecting the case of a local.get
there, but we'd need to be careful of a local.get
of a scratch local (edit: and StackIR stuff as in the other PR). Overall it is more complicated.
I think if we end up with types that cannot be cast in the future then we'd have no choice but to make that change, but for now it seems best not to, to me.