finite-wasm
finite-wasm copied to clipboard
spec: Export indirection
When functions are exported, the only way to account for these functions participating in host-to-VM calls is by introducing a trampoline indirection and placing some calls to instrumentation callbacks in said trampoline.
Taking one step back: do we actually need to specify export indirection?
I cannot think of a way export indirection can affect wasm-virtual-machine behavior. The only way I can think of it would be if we wanted to specify at exactly which amd64 instruction the execution should stop when hitting a limit, but I feel like we should avoid speccing that, just saying "if any limit is hit by the limitless execution trace, execution could stop at any point in the program but the output must be LimitExceeded with the amount of used gas specced"
That said, export-indirection.mkd feels like nice implementation documentation and the tests you added as good implementation tests, but they don't feel like something we should spec out to me.
Export indirection is an implementation choice, yes. I felt it is worthwhile to start working from here since it is very independent of everything else here, is easy to both describe and implement and is entirely optional. The alternatives are pretty well understood too, I feel, and it is pretty clear to me that indirection is the mechanism we want to continue using for the time being at least.
I guess I could just move the document away from spec to docs, although IME it is perfectly fine as a spec appendix.
I pushed up a commit that sketches out most of the implementation for the indirection transform. There are a couple of learnings here that I think are worth bringing up – first is that we’ll need to contribute to wasm_encoder in order to support everything wasmparser might spit out (atomics and global.get const-expr initializer in table elements both come to mind.) There are also portions such as component proposal implementation that wasmparser kinda wants to force us to implement.
I can say for sure though that splitting out things into analyze and transform phases seems to work quite okay at least for this transformation. I do definitely see ourselves wanting a nicer framework for walking through module and modifying only the interesting parts of it. Otherwise we end up with a 700-LoC modules for each analysis/transform that mostly are just traversing the structure and converting from one representation to the other. rustc-like Visit trait could be a reasonably good mechanism here and is also something I’ve seen used in wasm-mutate.
I have re-evaluated use of crates such as walrus over the past few days. So far my take on walrus and such is that while it is a very nice crate indeed, this crate might be too complex for us to really be able to tell if the crate is linear in its operation. Naturally, as we add thousands of lines of our code, it might become harder here as well, but I’ve been trying to structure the code in a way such that linearity is at least somewhat obvious. This isn’t the case with walrus, I fear.
I have also took opportunity to adjust a little bit the approach in our test suite. I went ahead and integrated insta. The integration is somewhat messy (unfortunately so), but it does seem like a better approach than doing all that work manually in wast files, at least?
since it is very independent of everything else here
I am curious if there are dependencies between exports and call_indirect? As far as I understand it, both exports and indirect calls face the same fundamental problem: we can't really instrument the call-site. Could/should we use the same mechanism to handle both?
I am curious if there are dependencies between exports and call_indirect?
Well, implementation in nearcore wise, my proposal that exports, the start function and table elements (i.e. call_indirect) use the same indirection mechanism (as specified and implemented in this PR).
That said, there is a problem that I still need to think about/figure out with regards to call_indirect specifically. If you look at one of the functions from the instrumented snapshot:
(func $trampoline::f2 (;1;) (type 1) (param i32)
local.get 0
call $f2
)
you see that each trampoline will contain local.gets for each function parameter. Once this is instrumented with gas measurement, it will start charging the fees to set up the operand stack as necessary to call the function. For exports and start function this is okay and might actually be desirable. The latter can’t have any arguments in the first place, and for the former the host doesn’t really charge for this kind of operation, so it is actually nice that we can do it here. For call_indirect, though, this is going to end up charging for the operand stack setup twice, once in the call site and again in the trampoline.
Ah, so this indeed covers call_indirect case! Perhaps the prose can be a bit more explicit: "table elements" doesn't really connect in my brain with indirect calls. And I would see call_indirect as the primary thing to worry about here: it's much easier to write a loop which does call_indirect than to write a loop which calls exports.
though, this is going to end up charging for the operand stack setup twice, once in the call site and again in the trampoline.
:thinking: this makes me like "worse is better" solution of forgoing indirects and just charging on function entry ("too late", in some sense) more :)
thinking this makes me like "worse is better" solution of forgoing indirects and just charging on function entry ("too late", in some sense) more :)
Hm, but then we must make sure the stack is over-provisioned sufficiently to fit all the locals a function would want (hard problem as seen with our secondary stack check), and also have the number of per-function-locals limited to low-enough-numbers that their initialization isn’t taking a huge amount of time before we can charge gas for that.
If supporting multiple runtimes (without an ability to modify some of them) was not a concern, I would really just create a separate custom section in the wasm module and have our VM implementation charge the relevant fees during function’s prologue.
Unfortunately I don’t see good alternative approaches at the moment. I will split out the supporting code out into a separate PR, so that this PR doesn’t block any parallel work.
If supporting multiple runtimes (without an ability to modify some of them) was not a concern, I would really just create a separate custom section in the wasm module and have our VM implementation charge the relevant fees during function’s prologue.
I'm not sure we actually have this concern? AFAICT we're only using wasmer2 in production, and other runtimes are only ever used in order to run over the archive, develop or run tests. These three other use cases don't actually need performance, so I think we could live with:
- a pwasm-utils replacement "spec" that has the behavior we want, but with bad performance and non-crashing characteristics (ie. charging just after locals initialization), that we'd use for wasmtime (I don't think we even need it to support wasmer0 as wasmer0 wouldn't be supported by newer protocol versions anyway and isn't used for development so it'd stay on old pwasm-utils)
- a patched wasmer2 that follows the spec of 1, charges exactly as much gas, but also avoids the delay in cost application
This actually matches our initial idea of "write a spec gas instrumentation then make it production-ready", though it also means that there may be more work on this front than anticipated.
Even if we don't share the same implementation for gas/stack counting beetween the runtimes (thus having to ensure that the multiple implementations are all 100% spec compliant) the spec must be implementable either way.
I believe the current spec allows to charge gas/account for stack using either approach, but it isn’t 100% clear to me that it will remain so indefinitely. If base webassembly spec needed to add some trapping runtime validation during execution of a function call, then there’d be a possibility of non-deterministic execution.
Ah, so I was looking at a wrong place, I was expecting call_indirect to do some runtime checks, but I wasn’t finding any references to runtime checks and traps. Turns out I just missed the relevant section.
So even today we must specify whether these traps must occur before our charges or after. As spec is written right now, these trapping conditions must be evaluated before any stack operations or gas charges occur as part of the function call. This in practice prohibits use of these indirection trampolines for indirect calls, I… think?
Hmmmm I think your link mostly means that we must specify whether we charge before or after runtime-typechecking the indirectly-called function?
IMO if we say that the runtime-typeck must happen before the gas charges (IIUC, how the spec is currently written), then it means that both the options I'm suggesting can still be used correctly: one instrumenter that charges "too late" (after locals init) for development and testing, and one patched compiler that charges "just right" (after runtime-typeck that happens before the push rip/jmp as instrumentation would be in the prologue, and before locals init)
Does that make sense?