interface-types Generalize+formalize our supported subset of Wasm in `wasm ... end` blocks

Generalize+formalize our supported subset of Wasm in `wasm ... end` blocks

Open fitzgen opened this issue 4 years ago • 9 comments

Note: I mentioned this in a comment in #61 but I think it is worth making a top-level issue / proposal for.

We want adapter functions to be able to do some subset of things that Wasm proper can do:

call exported functions
push constant Wasm values onto the stack
load memory (maybe; not totally clear yet)
manipulate tables (maybe; not totally clear yet; but would allow putting anyrefs into tables and then allowing the adapted module to work with indices for C/C++/Rust)

So far, we've been defining new instructions that are similar to various Wasm instructions⁰, but which operate on the heterogeneous adapter stack, potentially have a different name from the corresponding Wasm instruction, potentially have a different encoding from the corresponding Wasm instruction, and will need their own validation rules and execution semantics.

I propose that we allow embedding Wasm blocks into adapter functions instead:

wasm instr* end

These blocks would be able to take Wasm values off the top of the heterogeneous adapter stack, and return Wasm values back on to it.

We would

define the subset of instructions that are allowed in such blocks (e.g. allowing us to forbid control flow),
define the validation context that these blocks are validated within,
and define how to construct the runtime structure that they are executed within.

This allows us to reuse names, binary encodings, validation rules, and semantics in a general, principled way.

What do y'all think?

⁰ Some instructions are similar to Wasm instructions but are actually fundamentally different when you get down into the details, like local.get and getting adapter function arguments. One operates on abstract interface types, the other Wasm value types. I'm not talking about these sorts of instructions, only talking about when we are really duplicating a subset of Wasm functionality, which we don't want to do according to our design principals.

Sep 09 '19 17:09 fitzgen

This seems like a really promising idea. I especially like that it gives us a principled way to duplicate core wasm instructions in a manner that isn't really "duplicating", but just "reusing".

To build on your idea: if wasm instr* end was defined to simply be a block, then that gives us a nice self-contained validation/execution unit for popping the top N core wasm values from the stack and pushing M core wasm values.

Also, we already have an example of embedding a subset of wasm instructions, viz., the constant expressions that can be used in global variable initializers and data segment offsets.

Sep 09 '19 20:09 lukewagner

This is a step further away from what we were originally thinking: that the adapter specification would be declarative. More problematically, it carries the risk that we will not be able to collapse pairs of lifting and lowering operations into move operations.

On Mon, Sep 9, 2019 at 1:03 PM Luke Wagner [email protected] wrote:

This seems like a really promising idea. I especially like that it gives us a principled way to duplicate core wasm instructions in a manner that isn't really "duplicating", but just "reusing".

To build on your idea: if wasm instr* end was defined to simply be a block https://webassembly.github.io/multi-value/core/valid/instructions.html#valid-block, then that gives us a nice self-contained validation/execution unit for popping the top N core wasm values from the stack and pushing M core wasm values.

Also, we already have an example of embedding a subset of wasm instructions, viz., the constant expressions https://webassembly.github.io/spec/core/valid/instructions.html#constant-expressions that can be used in global variable initializers and data segment offsets.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/67?email_source=notifications&email_token=AAQAXUDSJ7S6W5UFX4WGMADQI2TYNA5CNFSM4IU52H6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6I3HYQ#issuecomment-529642466, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUFCJMGXGV7JMNCZAW3QI2TYNANCNFSM4IU52H6A .

-- Francis McCabe SWE

Sep 09 '19 20:09 fgmccabe

The proposal, as I understood it, would not change the expressivity of adapter functions, only the way in which they were specified, so it should have zero impact on the declarativity of adapter functions. Moreover, I don't think a slippery slope argument applies here either; there are very fundamental and hard impl requirements that would prevent including any instructions (imported from core wasm or not) that break declarativity.

I think we need a full writeup of what the lifting-and-lowering-collapsing idea is before we can really get into it but, ahead of such a writeup, knowing what is meant by that phrase, I think there is no issue: the essence of the "collapsing" is lazy evaluation of a lifting operator until its value is consumed by a lowering operator and there is necessarily no core wasm instruction on this path by nature of core wasm's not consuming or producing interface types.

Sep 09 '19 21:09 lukewagner

+1, this is probably the simplest way to do this. I think we also need a signature here as an immediate to the wasm-block.

This is a step further away from what we were originally thinking: that the adapter specification would be declarative.

That doesn't really need to be the case. By limiting the set of wasm instructions, we can keep things more declarative. For example i32.const 4 represents a value, but doesn't need to be ordered with respect to anything else. Put another way, what specifically about this is fundamentally non-declarative? If "execute this wasm block linearly" is problematic, we're totally free to specify this with semantics that are more beneficial.

More problematically, it carries the risk that we will not be able to collapse pairs of lifting and lowering operations into move operations.

Our spec still needs to reason about these operations as though they were "first-class citizens", precisely for this purpose. We need to define our execution order such that we can sequence calling exports and reading from memory deterministically. We don't need to define those to map precisely to core wasm. Given that we need to validate the subset of instructions we allow here, we know we need knowledge of how they work elsewhere in the spec, so it's perfectly reasonable to reason about them when optimizing load+store into a move.

Though I think there's some amount of ambiguity with how we choose to interpret "reuse the semantics of existing wasm instructions." I propose we interpret that as usefully as possible, and if "useful" here means "with minor tweaks" then I'm fine with that.

Sep 09 '19 21:09 jgravelle-google

This sounds like a great idea to me, I like how this lets us focus on just instructions dealing with the interface types values and conversion from them to wasm values, which is sort of what you might expect!

One thing I've been wary of in the past is that if we make the spec too flexible in terms of instructions we may run afoul of an issue where it's unduly complicated on engines to produce an optimal implementation for each adapter it may see in the wild. We may accidentally run into a situation where, for example, wasm-bindgen happens to generate one style of bindings and embind produces another, but those are the only instruction patterns which are actually at all performant in the wild (or more likely, producers converge on the same one since engines in practice only optimize one particular pattern of instructions).

This isn't really a concern that's specific to this proposal per se, but I think it's worth considering. This can also be both a good and a bad thing though:

Pro (of being flexible) - this makes it quite easy on the producer side to write powerful modules and makes it easier for generators like wasm-bindgen and embind and such to use. In general it can simply make source level translation to interface types much easier. Although compilers/engines may not optimize every single pattern when this proposal first lands, over time they have a lot of room to grow and adapt to patterns seen in the wild.
Con (of being flexible) - sort of the converse of the pro, we run the risk that the amount of time it takes engines to catch up to producers in producing optimal bindings may take a prohibitively long time (or in the worst case, never!).

Overall I personally prefer to err on the side of moving complexity to engines because there's just a small handful of them relative to the number of producers of wasm modules (aka all programmers compiling to wasm), but there's definitely a balance to be had!

Sep 09 '19 21:09 alexcrichton

Also useful, a list of possible instructions we might want:

*.const
*.load / *.store - simple loads+stores should be fine?
i32.load8_u / i32.store8 - single-byte precision is likely to be useful, can build arbitrary larger loads+stores out of these

More questionable list of instructions:

drop - ignore stack elements. We might want an interface-aware version instead though, because we probably want to be able to ignore higher-level values too.
i32.add - for computing offsets. Possibly unneeded for memory (load+stores have immediates for constant offsets + alignment), but may be useful for accumulating addresses from an array.

This is a pretty small set. No control flow, so we don't really need comparisons either. Limited/no arithmetic. I predict we will want more instructions here over time, but can be conservative about adding them.

Sep 09 '19 21:09 jgravelle-google

@jgravelle-google

I think we also need a signature here as an immediate to the wasm-block.

Agreed! When I suggested above that wasm instr* end be, effectively, just a wasm block, that means pulling in the blocktype, which is basically the signature of the block.

Sep 09 '19 22:09 lukewagner

I mentioned this in a comment in #72 , but what if we drop the wasm/end from the text format, and use it as an escaping for the binary format only. Mostly because, as in Luke's example, they add a lot of line noise to the text format.

As an alternate/related formulation, we could simplify the encoding to just be a prefix byte. So instead of needing to fuse multiple wasm/end blocks, we could say:

;; text format
i32.const 16
i32.load
as-int u32 i32
i32.const 4
mem-to-string

=>

;; wasm instrs marked
wasm i32.const 16
wasm i32.load
as-int u32 i32
wasm i32.const 4
mem-to-string

=>

;; binary
0x01 0x41 0x10
0x01 0x28
0x02 0xff 0x7f 0x7f
0x01 0x41 0x04
0x03

(With 0x01 = wasm, 0x02 = as-int, 0x03 = mem-to-string, and 0x7fff = u32, for the purposes of reifying this example)

The thought here being that we can avoid modeling the types of wasm blocks for multiple instructions, if we need to have a limited set of wasm instructions that we can actually understand. Then the prefix byte just lets us blindly re-use the wasm binary encodings, without worrying about future extensions or collisions etc.

Sep 24 '19 22:09 jgravelle-google

what if we drop the wasm/end from the text format, and use it as an escaping for the binary format only.

Yeah for sure -- the text format is not the important part of this idea, the validation/semantics/encoding reuse is.

Sep 24 '19 23:09 fitzgen

interface-types interface-types copied to clipboard

Generalize+formalize our supported subset of Wasm in `wasm ... end` blocks

interface-types
interface-types copied to clipboard