design "Lightweight Component Model" as new scope for layered linking + interface types spec?

This issue is for follow-on discussion of the CG-04-27 presentation which was itself a follow-up proposal based on discussion in the duplicate import issue (#1402).

A high-level summary of the presentation is:

Discussion in #1402 suggests that module linking be factored out into a layer above core wasm (resolving #1402 with "no").
Interface types would be much simpler as an (optional, if desired) feature proposal extending this new layered spec
Considering future evolution and the wide variety of possible linking features and host implications, we must carefully consider the scope of this new layered spec.
A set of relevant use cases and requirements were presented, proposing that the scope we want is a lightweight component model, with implications for what kinds of linking would be supported.

Separately, the presentation suggests next steps, but I think perhaps first we should discuss the general question of "is this the right general direction"? Of course, even if the CG thinks "yes" to the general direction, the precise list of use cases and requirements is open for further iteration (ideally in a new repo, so we can open a bunch of independent issues).

Apr 28 '21 19:04 lukewagner

Strong +1 from me on the general direction for Module Linking. Great presentation and nice thorough analysis of the various linking options @lukewagner !

(I have not understood the implications for Interface Types yet, which I know less about, still thinking about that part; but a nice thing with the layering is that that becomes optional, as you said, so I have no concerns there.)

One suggestion, a minor bikeshed (apologies) about the name "adapter module" ((adapter_module ..) as appearing in the code samples). A more unique name, not just a qualifier on module, might be clearer? As it isn't a variation on "module", it's distinct. Also it would be nice if it were just one word like pretty much all other Wasm terms (module, func, local, import, etc.). That would also avoid some possible confusion with things like someone using "adapter module" in a loose sense (say, as in a module that functionally adapts between their old API and their new one). Some ideas:

"component" (unless you want to reserve that for more things?)
"adapter" (without "module" it still seems pretty clear to me)
"frame" (since this is kind of a framework on which modules are "placed" and then connected)
"connector", "connective", etc.

Apr 28 '21 21:04 kripken

@kripken Thanks a bunch Alon, glad to hear that. I've been considering renaming adapter_module to component too :) I was reticent to propose the additional change (for those familiar with the interface types explainer) at the same time as everything else, but if the CG does agree on this overall component direction, I'll be happy to file an issue to discuss what's the right terminology for things.

Apr 28 '21 23:04 lukewagner

I am generally super excited for the features/direction as outlined in the CG presentation. It is all very well thought out.

If anything, I'd be a little worried about how the complexity of all this repo & proposal separation would affect forward velocity. I'd be tempted to keep it all bundled in one, but instead define milestone buckets to simplify the first versions (i.e. separate by time rather than topic).

Yes, I see these are modular, in that you could be doing module linking that doesn't involve interface types, and you could be using interface types with existing import/export matching. But to me, IT is the more pressing of the two, and I would think that if you'd be doing fancy module linking you'd want IT anyway.

IT also likely needs a longer runway to be adopted by languages etc, as it affects data representation, allocation and conversion, whereas ML generalizes the organization of function signature imports/exports we're already using (not counting Wasm-managed instances, which would be entirely new).

Apr 29 '21 18:04 aardappel

IIUC, the plan of record for IT was for it to be layered on top of the core spec as a separate spec document. This proposal establishes a minimal version of that new spec layer, so it seems like all the technical content here is on the critical path for IT anyway. I'm also not too worried about a separate top-level repo slowing things down either, since it seems reasonable to split out different spec layers into different repos.

Apr 29 '21 18:04 tlively

It's a good question of whether all the repo-splitting would help more than hinder. For example, by the end of it, I think everyone felt that the reference-types-vs-bulk-memory split hindered. Agreed with @tlively that module-linking purely factors stuff out of interface-types (and those repos are already separate). Adding the new top-level repo is attractive I think for separating out "meta" discussions (of the type we have here in the design repo) from the nuanced technical questions of the feature repos (which, after complete, will become inactive/archived, as feature repos do). Lastly that leaves splitting out adapter-functions, which I think makes sense because there is simply a bunch more adapter-function-specific technical discussion to be had which will be useful to have partitioned from the MVP-oriented discussion of the preceding two features (and ongoing after the MVP is done).

Apr 29 '21 19:04 lukewagner

As an update: I proposed an agenda item for the May 25th CG meeting for 20 min, ending with the original polls ("does the general direction sound good?" and "should we proceed with the next steps?"). In the meantime, I'm happy to have any more discussion on these two questions in this issue.

May 06 '21 20:05 lukewagner

I appreciate the ambitious vision as presented, yet I am still very worried about the choice of picking UTF-8 as the sole canonical representation for strings in an IT MVP, respectively until adapter functions have been fully fleshed out post-MVP (if I understood the need for splitting out adapter functions correctly?), since doing so seems to lead to a (temporary, hopefully not for long but who knows) situation where languages that essentially use JS's string encoding (like AssemblyScript, Java, C#, ...) and want to exchange a string with JS or Web APIs in the browser have to re-encode twice and potentially trap in either direction (or irrevocably modify the data with replacement sequences), that is from WTF-16 to UTF-8 (may trap) back to WTF-16, putting Wasm in the browser respectively Wasm for many high level languages in an unfortunate spot by design. I also worry that deciding for UTF-8 here may lead to similar outcomes in context of GC and other critical proposals, which would be even more unfortunate for high level languages and generally Wasm on the Web platform I think.

As such I would more than appreciate a solution that covers both perspectives from the start, as I think both are equally valuable, e.g. for running low-level languages on the server with WASI (benefits from sharing UTF-8 representation) or high-level languages on the web with JS (benefits from sharing WTF-16 representation), and of course arbitrarily mixing these on and off the web.

I certainly don't want another https://github.com/WebAssembly/interface-types/issues/13 to happen here, which is the original thread this concern surfaced in, and I am not sure how much of this would already be set in stone by voting for the general direction (would it be?), yet I figured it might be good to bring this up once more as I think it is important, also in context of my worries expressed in https://github.com/WebAssembly/design/issues/1407.

May 06 '21 22:05 dcodeIO

I wouldn't consider the canonical ABI's Unicode encoding for strings to be part of the general direction that we'd be voting on. There's still quite a number of subtle technical design questions to be sorted out; having a general direction (scope, use cases, requirements, etc) established helps set the context in which to have these technical discussions.

May 07 '21 14:05 lukewagner

Sounds reasonable, thanks. One more question perhaps: Do you think it would make sense to delay the decision to split out adapter functions from an IT MVP, since splitting to post-MVP may already lead to the need to make a choice regarding a canonical ABI in the first place, which to me seems to easily go down a path of https://github.com/WebAssembly/interface-types/pull/132, which kinda already picks UTF-8 and as such subtly breaks backwards compatibility with existing Web platform APIs?

May 08 '21 00:05 dcodeIO

I think we'll need to discuss the technical issue of string encoding in the short term in any case, whether via the canonical ABI MVP or in the design of adapter instructions, so this splitting choice doesn't change the timing of that discussion. At the same time, there is a strongly-felt timeliness for defining an MVP that folks could start working with in the short term, so I think the splitting-off of adapter functions is an essential aspect of the plan, since otherwise we're basically asking everyone to wait an indeterminate amount of time.

May 10 '21 19:05 lukewagner

I am generally in favor of this now (I was the other "no" vote at the meeting), since it does not preclude other work on module linking. I have a few questions, and bear with me in case I ask something covered by a different discussion on module interactions - feel free to just point to it.

How would components be described, for example would component model imply some kind of interface that components are expected to follow (and is this what Canonical ABI is for)? How would this extend to future layers of linking dynamism? "Prior art" examples in the slides, even the most lightweight, assume an "interface" ABI as a way to standardize instantiation and dispatch. Those ABIs are not necessarily assumption-free, for example COM does reference counting, some string encoding might be built in, and so on.

As an example from a different managed runtime, JVM finds classes by looking up class files on composite classpath. This is different from EJB, which is RFC API (and also the thing everybody has nightmares of). This does not prevent it from building much more complex functionality on top of class discovery, said RFC for example. It is also at least partially responsible for interactions between bytecode files compiled from different source languages.

Couple of extra questions - I know those can apply to various items outside this repo, but here it would be easier to find:

I am curious to get some clarity on @dcodeIO's point - canonical ABI draft on one hand states that encoding can be specific to implementation, but on the other has what looks like UTF8 decode functions.
Would current approach require describing component relationships via a manifest or IDL?

Jun 02 '21 02:06 penzn

Thanks, and good questions.

How would components be described, for example would component model imply some kind of interface that components are expected to follow?

A component would have a text and binary representation, symmetric to core modules. The definition of the text and binary representation would embed core modules' text and binary representations (just like the currently-proposed module-linking binary format sketch, except the outer module is a component, not another core module). Components would describe their interface in terms of interface types used in the types of import and export definitions (just like core modules do today, but replacing core valtype with intertype). Thus, no separate manifest or IDL would be needed at load-/run-time. (How components are produced is a separate story determined by the toolchain and may involve IDL files like witx or inline source annotations.)

As an example from a different managed runtime, JVM finds classes by looking up class files on composite classpath. [...] Would current approach require describing component relationships via a manifest or IDL?

From the parameterization, not namespaces requirement, there wouldn't be any standardized classpath or the like: components are parameterized by their imports so it is ultimately each component instance's client who gets to supply the arguments at instantiation-time. When a component is instantiated by the host, the import resolution scheme is up to the host (e.g., browsers would use either the HTML module loader (via ESM-integration) or the explicit arguments of WebAssembly.instantiate to supply import arguments). But once a root component is instantiated by the host, all nested instantiation is under the control of wasm, as described by the current module-linking proposal.

How would this extend to future layers of linking dynamism?

The idea is basically to take the instantiate currently proposed to be allowed in instance definitions and also allow it as a runtime instruction (in adapter functions). So same basic operation, just happening at run-time, not instantiation-time.

(and is this what Canonical ABI is for) [...] canonical ABI draft on one hand states that encoding can be specific to implementation, but on the other has what looks like UTF8 decode functions

The canonical ABI would be a fixed scheme for going to and from linear memory and a given interface-typed signature (in the same way that protobufs will assign a particular expected memory layout for a given .proto file). And, just like protobufs must assign a particular en/decoding scheme, so does the canonical ABI. But once adapter functions are added, the canonical ABI could be redefined to desguar to a particular adapter function, with other (non-canonical) ABIs expressible in terms of component-defined adapter functions.

Jun 02 '21 20:06 lukewagner

This has now happened! For futher questions about the component model; please file issues in the component model repository.

Oct 29 '22 01:10 sunfishcode

design design copied to clipboard

"Lightweight Component Model" as new scope for layered linking + interface types spec?

design
design copied to clipboard