exception-handling
exception-handling copied to clipboard
Proposed instruction format for reintroducing exnref
It’s been a while since we posted #280 and did a presentation in the 8/1 CG meeting. I think this is time we kick off more concrete discussions on what to do for the next steps.
CG Meeting Schedule
We reserved two slots in September CG meeting: On 9/12, we are going to make a brief announcement that we have @titzer as a ch-champion of the EH proposal, and let people know that discussions are taking place in Github. On 9/26, we will have a 15-min slot for the recap of our proposed option and some possible discussions. I hoped we could do it all on 9/12 but the slot had already been fully reserved. And we plan to do a vote on the actual switch at the October hybrid CG meeting (remote option is available).
Proposed Instruction Format
And we’d like to propose below as our endorsed option to reintroduce exnref.
try blocktype
catch i li
catch j lj
...
catch_all l
instruction*
end
In the folded format, this can be also written as
(try (result ...) (catch $i $li) (catch $j $lj) ... (catch_all $l)
...
)
When an exception is thrown within instruction*, it is checked against each of the tags in order in the list of catches, and if it is caught by one of the catches, the control flow branches to the corresponding label with extracted values and the exception’s exnref. So if a tag contains a single i32, its target label’s result type will be a multivalue of (i32, exnref). The catch_all label’s return type, if present, is just an exnref.
So an example of a try with two tags and a catch_all would be:
block $li (result …, exnref)
block $lj (result …, exnref)
block $l (result exnref)
try blocktype (catch i li) (catch j lj) (catch_all l)
instruction*
end
end
handler code for catch_all
end
handler code for catch j
end
handler code for catch i
This is a variant of Option B we presented on 8/1. The difference is we bundle the catch labels with try, preceding the instructions that they guard, for ease of decoding. Because catches are with the try, we end the block with a normal end, as in blocks and loops.
This option doesn’t have the many-partitioned blocks of try-catch-...-catch_all anymore; instead now try is a block that additionally has attached catch and catch_all clauses. Note that we don’t need the (do) syntax in the text format anymore. Also in binary format, the handler list can be just a vector of immediates as in br_table, so we don’t need actual opcodes for catch and catch_all.
(Our presentation on 8/1 incorrectly included br_on_exn for Option B, which is not necessary because catch pushes extracted values along with an exnref. The slides have been fixed.)
rethrow will take an explicit exnref, not the label. The current rethrow instruction, which takes a label, creates semantic dependency on the lexical context which has been a challenge to some compiler code transformations.
throw instruction will remain the same.
try-delegate will be removed from the new spec because we can achieve the same semantics using exnref and branches.
We chose this option because several stakeholders we talked to expressed preference for it. The reasons include:
- More in line with other Wasm regular control flow instructions, e.g., blocks and branches
- The number of required opcodes are about half of the current proposal (no
try-delegate, no need forcatchandcatch_allopcodes) - Simpler decoding from the VM side
I was able to talk with several VM and toolchain authors and they said the plan seemed doable.
@rossberg prototyped this option in https://github.com/WebAssembly/exception-handling/tree/exnref-1b branch. You can also see the diff in https://github.com/WebAssembly/exception-handling/compare/main...exnref-1b#diff-667145a59d9b08f9350bb76322feec7bbe56dda640a84dd5a9f0a65097ec452e.
Migration Plan
If we choose to adopt this option, the migration will progress cautiously given the circumstance that the current EH instructions have been already shipped to the web. The VMs will be supporting both current and the new instructions for the foreseeable future so apps already deployed won’t be disrupted. Emscripten + LLVM toolchain will support the new option under a flag first, and switch to it after it is stabilized. We also plan to provide a converter tool from the old binary to the new format to facilitate the use of the new instructions without the need for recompilation. We have a tentative plan, or more of a hope, for the deprecation of the current instructions, but this will only happen when we can be sure that the usage of the old instructions is low enough, which may turn out infeasible. Before any deprecation, we will maintain the documentation of the current instructions in the supplemental Phase 3 documentation.
@aheejin This looks good to me and matches my understanding.
Two small text format questions:
- Can we give the new
trya different name? This would make it easier to know which opcode to emit in text parsers. We could guess based on whether thetryis immediately followed by catch's or instructions, but this wouldn't allow the text format to use the new opcode for empty try's. For my prototyping, I have used thetry_tableopcode as this instruction feels similar to abr_tableinstruction. But open to other options. - Instead of
rethrowcould we name the new instructionthrow_ref? That would make the instruction a bit more future proof if we added instructions to allocate exceptions without throwing them. In that case, therethrowwould really be the firstthrow.
Yes, throw_ref makes sense. Not sure I like try_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?
Yes,
throw_refmakes sense. Not sure I liketry_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?
Wouldn't renaming the old opcodes mean that old text-format wasm no longer compiles and produces confusing errors? Vs retiring the old opcodes with their names intact, and using new names for the new ones that allows straightforward 'x is deprecated, use y' error messages. I am not sure how I feel about try_table as a name but the relationship between the new try and br_table does seem pretty clear to me.
Changing rethrow to throw_ref <exnref> vs the old label-based one seems nice.
Yes,
throw_refmakes sense. Not sure I liketry_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?
I agree that it would be nice to rename the existing instructions. Practically speaking though, that breaks a lot of text format tests that we have right now, and is very disruptive to other users of the text format. I think we could consider it in the future, but for now I'd prefer to have a unique name for the 'new thing'.
IIUC, the only ambiguity is the case of a try with no handlers. But in that case, both interpretations also happen to be semantically equivalent. Would it not work to stick to try for both and reinterpret that one case to be the new form?
Yeah, that is the only grammatical ambiguity. It would be difficult to do though in the parser I work with (bytecode-alliance/wasm-tools), where no expression AST is built up for while parsing. There we can only see if the next tokens is an 'immediate handler' (catch $tag $label) when deciding which kind of try opcode to emit. We could not defer that decision until after we've seen the whole block to see if there were legacy catch delimiters.
So we could interpret an empty try as the old opcode easily, but not as the new one.
But aside from the implementation details, I would feel uncomfortable diverging the text format from the binary encoding details here. Even if we want to remove these instructions eventually, in the meantime we need to be able to precisely emit both of them for tests (even if semantically equivalent). And also see exactly which one a binary is using.
I suppose another option could be to disallow an empty try for the new instructions. It'd be trivial for tools to rewrite those to blocks.
I added #282 for some details on the exnref type. Feedback appreciated.
It would be difficult to do though in the parser
I see.
I suppose another option could be to disallow an empty
tryfor the new instructions. It'd be trivial for tools to rewrite those to blocks.
Let's not introduce special cases. I'd rather go with try_table in that case. :)
I have a working implementation in SpiderMonkey that I hope to land soon. I had to fill in some binary details and some small execution semantics. I tried to summarize them here. Let me know if I got anything wrong.
Types
exnref is encoded as SLEB128(-0x17) also known as 0x69
exnref is a new type hierarchy, no subtyping with other reference types.
Exception handling with GC proposal should define a bottom ref type of nullexnref and bottom heap type of noexn.
ToJSValue(v, 'exnref'):
throw TypeError(); // as v128 does today
ToWebAssemblyValue(v, 'exnref')
throw TypeError(); // as v128 does today
Instructions
Text format
throw_ref
try_table blocktype (catch | catch_ref $tag $label)* (catch_all | catch_all_ref $catch_all_label)?
instr*
end
Binary format
throw_ref ::= 0x0A
catch_kind ::=
0x0 => catch
0x1 => catch_ref
catch ::=
$kind:catch_kind $tag:u32 $label:u32 => ($kind $tag $label)
catch_all ::=
0x0 => nil
0x1 $label:u32 => (catch_all $label)
0x2 $label:u32 => (catch_all_ref $label)
try_table ::=
0x1F $ty: blocktype $c: vec(catch) $ca: catch_all
instr*
end
=>
try_table $ty $c $ca
instr*
end
Validation
throw_ref is valid for type:
[exnref] -> [t*]
for any t* (stack polymorphic, like throw and unreachable)
try_table $blocktype ($catch_kind $tag $label)* ($catch_all_kind $catch_all_label)?
* validates as (block $blocktype) would
* every ($catch_kind $tag $label) must refer to a valid tag index and label index
* for kind == catch, the label must match a target type of [tag-params...]
* for kind == catch_ref, the label must match a target type of [tag-params..., exnref]
* any (catch_all $catch_all_label) must refer to a valid label index
* for kind == catch_all, the label must match a target type of []
* for kind == catch_all_ref, the label must match a target type of [exnref]
Execution
throw_ref will throw the received exnref value, trap if it is null.
try_table will execute as (block $blocktype) would if there are no thrown
exceptions. If an exception is thrown, its tag is checked against the list
of catches in order to find a match. If there is a match, then execution
continues in that label with the exception unpacked. If there is no match
and a catch_all, execution continues in that label. Otherwise, the exception
is propagated upward in the control stack.
Should we also consider a case that unpacks and drops the exnref, such as catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.
That mostly seems to match what I defined here and implemented in the reference interpreter, except for the choice of opcodes. I don't feel strongly about those, but if you made different choices you probably won't be able to run the test suite binaries generated by the interpreter for now.
Should we also consider a case that unpacks and drops the exnref, such as
catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.
fwiw, we would use this in our JIT and (if llvm exposed it) I think we might use it in our AOT compiler too. We use exceptions more like signals ('something happened, please unwind') than values containing information
That mostly seems to match what I defined here and implemented in the reference interpreter, except for the choice of opcodes. I don't feel strongly about those, but if you made different choices you probably won't be able to run the test suite binaries generated by the interpreter for now.
@rossberg
Oh, I completely forgot you implemented this already. I adopted your opcodes for throw_ref and try_table. It looks like your exnref type code overlaps with GC though, so I am keeping it at -0x17. I'll edit the above post too to be up to date.
Should we also consider a case that unpacks and drops the exnref, such as catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.
@titzer
That would be fine with me. I think the simplest way to do that would be to add a flag in the binary for each of the catches for whether it's a catch_ref (which has the exnref) or plain catch (which does not). And the existing binary flag for catch_all could be a three-way for 'no catch_all' 'catch_all_ref' and 'catch_all'.
WDYT?
@eqrion A flag for each catch sounds fine, and the catch_all that drops sounds fine as well.
Extra flag byte sounds fine to me and is extensible. (If we wanted to save the extra byte we could also use a hack and represent the labelidx (or the tagidx) as a signed LEB, and abuse the sign as the flag. But I don't think that's worth it.)
Oops, yes, you are right about the type opcode. I changed it to -0x17 as well, thanks.
Agree that flags byte is probably the best bet.
About the catch_drop optimization, would it be effective when the whole program uses it, or it's worth it if only a part of the program uses it? In Emscripten (+ LLVM)'s C++ exception support, I'm not sure how many trys can use the optimized version.
try {
...
} catch (int i) {
}
This kind of code needs exnref because if the exception is not an int we should rethrow. Also all destructor calls, which consists of I guess more than 95% of try usage in most C++ programs, needs exnref because we need to rethrow the exception after running the destructors.
Which is to say, not sure how effectively we can use the catch_drop version in C++ support. It may be useful for other toolchains though.
About the
catch_dropoptimization, would it be effective when the whole program uses it, or it's worth it if only a part of the program uses it? In Emscripten (+ LLVM)'s C++ exception support, I'm not sure how manytrys can use the optimized version.
My understanding is this would only help in cases where the nearest enclosing try for a throw is a catch_drop. I agree, it seems not useful for C++, but possibly for others. If the binary size of the extra flag byte becomes an issue, we could maybe use Andreas suggestion (or some other idea) to minimize the cost for C++.
I updated the previous comment that documents what SM is targeting. I chose [catch/catch_ref] and [catch_all/catch_all_ref] instead of [_drop] terminology, as I think that matches the rest of the language design better. But it's not that important.
I'm fine with or without dropping versions. I don't think the extra byte is an issue, since catch clauses shouldn't exactly be frequent, and their size is certainly dominated by the block structure needed for the jump targets. Even if we decided not to have the dropping versions now, it'd probably be wise to reserve the extra byte.
The exnref-1b branch now has the latest instruction format (Option B').
I also updated the spec document for Option B', see #283. That should check all the spec and interpreter boxes for moving Option B' to phase 4. Now it's up to implementations and toolchains. :)
From one runtime-implementer perspective, it would be helpful if we could
- merge this to main so the testsuite repo can pick these changes up
- update binary.wast to test the changes made to the binary format (e.g. ExnRefType/try_table/throw_ref and the encoding of catches)
- have the "legacy" tests also visible in the testsuite repo somehow (for implementations that want to preserve support)
@keithw, happy to merge this to main once the code review is done.
For the test suite, see #284.
Agreed regarding more binary tests, but somebody else should perhaps volunteer for that one. :) In fact, it'd be good to have more tests for this proposal in general.
In fact, it'd be good to have more tests for this proposal in general.
Agreed -- I hope we can find a way to merge #270 and #277 now (after appropriate review).
This situation is deteriorating rapidly. Your repeated changes have significantly disrupted the implementation of EH in wavm. My efforts to incorporate exception handling into a wasm virtual machine are being hampered by your changes. It's disappointing and frustrating to see the obstacles you're creating.
Also, you do not change llvm itself, how could i supposed to support this if you do not support LLVM?
@trcrsired if you have concerns about the state of the EH proposal that go beyond the text/binary encoding, can we discuss them in this issue?