design [js-api] Allow JS functions to be directly added to via `table.set`?

As of today JS functions can be directly supplied as imports, but they cannot be directly added to table. table.set only accepts native WebAssembly functions. As of today, there is no way to convert JS functions to WebAssembly functions. An API to creating WebAssembly functions is proposed in https://github.com/WebAssembly/js-types, but that requires the caller to know function signature ahead of time.

Rather than converting JS functions to WebAssembly functions for the purposes of adding them to tables, could we not simply allow JS functions to be added directly?

When supplied as imports, JS functions have universal polymorphic behaviour in that one can supply any JS function to any import, and indeed to all imports. No signature checking is done, and the provider of the function doesn't need to know the signature ahead of time. The number of arguments doesn't even need to match. This is a nice property to have in dynamic languages and in particular is makes lazy binding and dynamic linking easier.

For example, this property means we can use Proxy object or resolve imports without being aware of the signature of an import: https://github.com/emscripten-core/emscripten/blob/main/src/library_dylink.js#L470

A simplified version of this code allows use to use single function to resolve symbols dynamically at runtime:

function makeHandler(name) {
   return function() {
      return resolveSymbol(name).apply(null, arguments);
   };
}

While these universal (I guess you could call them variadic?) functions work fine as function imports, they are not permitted by table.set. This means that when we do dyanmic linking in emscripten today its easy to do lazy binding function imports, but lazy binding of function address imports is not possible, at least not without also knowing the signature of the function. I can't take the result of the makeHandler function at pass it to table.set.

To work around this limitation we used are currently considering adding adding extra signature information in a custom section so that table addresses can be dynamically assigned before all modules in the graph are loaded.

Is there any fundamental reason why we can't just do table.set(myHandler) and have that handler universally usable by call_indirect.. it might mean that the call_indirect could be slightly more efficient since the signature check could be skipped (since JS functions can't/don't do signature checks IIUC).

Mar 23 '21 16:03 sbc100

It's a good question. The last time I thought carefully about the implementation, it seemed like there were a few hard tradeoffs:

The easiest implementation I can think of is for each call_indirect to have a caller-side branch for "am I calling untyped JS?". If so, the caller wouldn't perform a normal call_indirect and would instead call a special engine-synthesized per-signature thunk that would box up the core wasm values into JS values and then call the JS function. This is essentially the same logic currently needed for calling a JS import (with the difference being that the callee isn't specific to the import, but needs to be loaded from the table and passed as an argument to the thunk), so probably this thunk could be reused. What's good about this approach is it should be as fast as an import-call-to-JS (which by now is well-optimized). The downside of course is that it adds extra code to every call_indirect.

The alternative is to try to do everything in a generic JS thunk that can be called by call_indirect without any special caller-side handling, and then the generic JS thunk handles the JS special case. For engines that do a caller-side signature check (e.g., V8, last I heard), I think the signature check would need an extra branch to add "|| callee is JS" to the condition. But for engines that use a callee-side check (e.g., SM), nothing extra is needed. The problem is that this generic thunk is going to be relatively inefficient and complex (compared to the type-specialized import thunk mentioned above) because it will have to dynamically interpret the caller's signature to box up the JIT args. Thus a possible extension would be to have the generic thunk dynamically forward to a signature-specific thunk (kindof like the first option, but we've done it callee-side). How this forwarding happens is an interesting question; maybe a thunk array indexed by a dense canonical signature index?

So it would be a question for JS engines as to whether these tradeoffs were worth the benefit.

Mar 25 '21 00:03 lukewagner

When you say it "adds extra code to every call_indirect" you don't mean adding code that would run for check for every call_indirect, right? The "am I calling untyped JS?" would only happen in the case where the normal signature check fails, right? (The case that currently just traps)

Mar 25 '21 00:03 sbc100

@sbc100 Yes, good clarification.

Mar 25 '21 19:03 lukewagner

Given this should have basically zero performance overhead for wasm-to-wasm indirect calls I think it seems like a reasonable change.

Making this change would eliminate the horrible hack that emscripten has to do today as well as being more flexible and simple and more compact than the proposed new WebAssebly.Function.

Aug 04 '21 18:08 sbc100

I'm not sure I follow. What concrete Wasm type would be assigned to such a function? Where would you derive it from? When supplying a JS function for an import, that type is derived from the import description. But no equivalent exists for table.set, because tables are just defined to be generic funcref.

It almost sounds like you are envisioning that we extend core Wasm with a new kind of function value that has no Wasm type -- i.e., we would bring untyped functions into Wasm itself, which behave polymorphically in call_indirect. But it's not clear how such a function reference would behave in other contexts than call_indirect (for example, with casts as under the GC proposal).

This looks like a serious can worms to me. I'm rather skeptical that the benefit over WA.Function is large enough to justify opening it. Having untyped values fundamentally violates the spirit of Wasm being typed and may have all sorts of nasty consequences downstream.

Aug 04 '21 19:08 rossberg

Yes, I was imagining as your describe: A table slot containing a polymorphic JS function that would never result in a run-time type check failure when called indirectly.

Its useful for dynamic linking where the signature of the final wasm function that will live in the slot is not known up front. A JS shim function can trigger the loading a shared library and propagate all arguments to the loaded wasm function (and replace itself in the table). Without this the (lazy loading) dynamic linker needs to know not just names but also function signatures.

Today this works fine for directly imported functions (we can supply a polymorphic shim for all function imports) but it does not work for functions imported by address only (table slots).

Aug 04 '21 21:08 sbc100

Without this the (lazy loading) dynamic linker needs to know not just names but also function signatures.

That's true, but given the js-types proposal, can't the linker read it off directly from the import descriptions along with the names? And then just apply WA.Function to the shim?

The problem with what you propose is that it is not just an extension to the JS API. It is observable from within Wasm if there exists a function object that can be successfully called with different types. So this would be a notable extension to core Wasm itself that requires a spec change and affects non-JS embeddings. We'd be leaking a JS-ism into Wasm's semantics, something we tried to avoid so far.

And there are consequences. In terms of the GC proposal, call_indirect is really just the optimised composition of table.get, cast, and call_ref. But what would happen if you used these operations separately? For example,

(table $t (export "table") 10 funcref)

(type $ft (func (param i32)))
(global $fr (ref null $ft) (ref.null $ft))

(func (export "func1")
  (global.set $fr (cast (table.get $t (i32.const 0)) (rtt.canon $ft)))
)
(func (export "func2")
  (call_ref (i32.const 42) (global.get $fr))
)

Imagine somebody stuffs a raw JS function into table slot 0, then calls func1 then func2. To be coherent with call_indirect and such a function's nature, this should succeed. But how? Either every call_ref would have to make an extra case distinction for untyped functions (which defeats the purpose of typed function references), or a cast would have to allocate wrapper functions (but a cast is not supposed to change the identity of a reference). Or untyped functions would match concrete types only in call_indirect, not anywhere else, but that would be rather odd from a type system perspective -- worse, it would make the dynamic linking mechanism incompatible with forming typed references inside the module.

Aug 05 '21 05:08 rossberg

So SpiderMonkey already does not implement call_indirect using casts; instead, it uses a more flexible technique (which I called Call Tags) that in other settings has been demonstrated to be useful for accommodating use cases such as this. In particular, deferred loading is the second application of the extension described in WebAssembly/call-tags#3, and in my own research we have found that the approach described in that extension works particularly well for efficient interop between statically typed (e.g. wasm) and dynamically typed (e.g. JavaScript) interop with type-directed coercions (e.g. toJSValue and toWebAssemblyValue), particularly because it can bridge the polymorphism gap that @sbc100's example illustrates. So, if people are interested, I believe I know how engines can implement this functionality without adding overhead to the calls that are already supported, and without changing core wasm beyond what the Call Tags proposal entails (which has various other applications anyways).

Aug 05 '21 12:08 RossTate

Without this the (lazy loading) dynamic linker needs to know not just names but also function signatures.

That's true, but given the js-types proposal, can't the linker read it off directly from the import descriptions along with the names? And then just apply WA.Function to the shim?

The problem occurs when a module only imports that address of a function as an i32 and does not import the function itself. In this case there is no signature associated with the import, the module is just importing a table offset which it will then use for call_indirect.

edit: I could work around this by adding a otherwise-unused import of the function itself, but that import would (currently at least) be DCE'd by binaryen's optimizer.

Aug 05 '21 18:08 sbc100

Following up, we ran experiments in a language with the same interop challenge posed here. We compared our implementation using SpiderMonkey's callee-side approach for call_indirect to simply unsoundly assuming the code pointer is of the correct type (i.e. the upper-bound on what is possible to achieve). We were unable to observe any overhead in the callee-side approach, which suggests that the callee-side approach is unbeatable in terms of performance (if implemented properly).

On the other hand, in the GC proposal we are observing that run-time casts of objects have quite noticeable overhead, which suggests that V8's caller-side approach to call_indirect likely has easily observable overhead.

Now, with the callee-side approach, @lukewagner notes that you could have a single stub that would handle all funcrefs converted from JS Callables in the manner suggested by @sbc100. However, he raises a performance concern that we were able to address in our language using a simple technique. You just have the tag used to identify the function signature also provide the code address to jump to if the funcref is one of the JS Callables. (You don't even have to call it, you just jump to it.) That address hardcodes the coercions from the WebAssembly types in the signature's param to JS values, then calls the JS Callable, and then coerces the returned JS value to the WebAssembly types in the signature's result` (just like the code generated for existing wasm-JS stubs does). In our language, we implemented this approach to support interop between two similarly statically vs. dynamically typed languages and found it performed extremely well.

So I believe we should be able to support this functionality with little-to-no overhead; in fact, there may not even be a performance advantage to using js-types. Furthermore, V8 happens to be currently investigating changing its representation and implementation of function references in hopes of addressing possibly related significant overheads being observed in the GC proposal. So now seems like a good time to discuss this proposal.

Nov 08 '21 14:11 RossTate

This would be super helpful if this was supported in WebAssembly.Global as well as it would allow for providing variadic functions as imports to WebAssembly in order to be able to implement things like records/strings/tuples/etc in a fairly simple way.

For example rather than an API that has a bunch of back-forward calls to-and-from wasm to iteratively build up a tuple say, we could have a single host function like:

const makeTuple = new WebAssembly.Global(
   { value: "anyfunc" },
   // JS function that makes the "tuple"
   (...values) => values,
);

const instance = new WebAssembly.Instance({
    host: {
        makeTuple,
    },
});

Wasm functions then could call such a host function with whatever parameter count/types they want using call_indirect (and we don't need to jump back and forth to host with a pattern like empty()/cons() for building tuples and such):

(module
    (global $makeTuple (import "host" "makeTuple") (funcref))
    (table 1 funcref)
    (elem (i32.const 0) (global.get $makeTuple))
    ;; type for making empty tuples
    (type $make0Tuple (result externref))
    ;; type for making 3-element tuple
    (type $make3Tuple (param i32) (param i32) (param i32) (result externref))
    ;; any types work fine
    (type $makePair (param i32) (param f32) (result externref))
    
    (func (export "makeEmptyTuple") (result externref)
        (call_indirect (type $make0Tuple) (i32.const 0))
    )
    (func (export "make123Tuple") (result externref)
        (call_indirect (type $make3Tuple)
            (i32.const 1)
            (i32.const 2)
            (i32.const 3)
            (i32.const 0) ;; index in table of makeTuple
        )
    )
    (func (export "makePair") (result externref)
        (call_indirect (type $makePair)
            (i32.const 42)
            (f32.const 9999)
            (i32.const 0) ;; index in table of makeTuple
        )
    )
)

This pattern would generalize to things like initializing records and such:

(module
    ;; ...initialize tables, global imports, etc similar to previous example

    ;; creates a record { x: 3, y: 5 }
    (func (export "makePointRecord") (result externref)
        (call_indirect (type $make2Record) ;; type of calling host.makeRecord with two pairs
            (call_indirect (type $make2Tuple) ;; type of calling host.makeTuple with 2 items
                (call_indirect (type $make1String) ;; type of calling host.makeString with 1 char
                    (i32.const 120) ;; "x"
                    (i32.const MAKE_STRING_INDEX)
                )
                (i32.const 3)
                (i32.const MAKE_PAIR_INDEX)
            )
            (call_indirect (type $makePair)
                (call_indirect (type $make1String)
                    (i32.const 121) ;; "y"
                    (i32.const MAKE_STRING_INDEX)
                )
                (i32.const 5)
                (i32.const MAKE_PAIR_INDEX)
            )
            (i32.const MAKE_RECORD_INDEX)
        )
    )
)

where we provide:

const instance = new WebAssembly.Instance(module, {
    host: {
        makeRecord: (...pairs) => {
            const record = { __proto__: null };
            for (const [key, value] of pairs) {
                record[key] = value;
            }
            return record;
        },
        makeTuple: (...values) => values,
        makeString: (...codePoints) => String.fromCodePoints(...codePoints),
    },
});

Obviously GC proposal (and new stringref proposal) will cover some of these uses, however it would still allow hosts to provide more variety in their collections than wasm might provide (e.g. maps, sets, etc etc) with calls that avoid many back and forth hops (although if GC provides tuple or similar then I suppose one can just use that for variadic calls to host APIs).

May 17 '22 04:05 Jamesernator

As of today JS functions can be directly supplied as imports, but they cannot be directly added to table. table.set only accepts native WebAssembly functions. As of today, there is no way to convert JS functions to WebAssembly functions. An API to creating WebAssembly functions is proposed in https://github.com/WebAssembly/js-types, but that requires the caller to know function signature ahead of time.

Rather than converting JS functions to WebAssembly functions for the purposes of adding them to tables, could we not simply allow JS functions to be added directly?

Rather than the somewhat "scary" idea of directly calling JS Functions, what if the JS API WebAssembly.Table#set were to branch upon receiving JS functions, and automatically wrap them in the appropriate WebAssembly.Function?

The cons I see to this approach would be the extra branching overhead per set, but I believe that set ends up in anyone's hot code, this would not be common? Besides this, object identity could be ruined?

const fn1 = console.log;

my_table.set(0, fn1);
const f2 = my_table.get(0);

// fails assertion
assert(fn1 === fn2);

Regardless, this has the benefit of the JS consumer not needing to know the Wasm types, ahead of time, while maintaining that engines get to add their JS<->Wasm conversion wrappers. The signature type should be obtained from the Wasm Table; is that feasible?

Does this suit your use case?

Jul 14 '22 21:07 vimirage

Sadly this doesn't work since table slots today are anyfunc, so the slot itself doesn't have type. Its type is not checked/known until its used in a call_indirect instruction.

Also, I'm not sure I agree its "scary" to directly call JS functions... its something we already all the time when JS supplies imports to a the wasm module.

Jul 14 '22 21:07 sbc100

Sadly this doesn't work since table slots today are anyfunc, so the slot itself doesn't have type. Its type is not checked/known until its used in a call_indirect instruction.

Update: ah, I just saw that in https://github.com/WebAssembly/js-types/issues/16 after this issue... Would this be possible for typed-function tables?

Jul 14 '22 21:07 vimirage

Even if we had a different table to every possible signature, the problem would then be more like: given a JS function which table should I try to put it in.

The point of this issue is that I want the JS function act as a polymorphic anyfunc that will never trap can called using call_indirect, just like I can supply a single JS function for all imports and it will act in a polymophic way regardless of the signature of each import.

Jul 14 '22 22:07 sbc100

Understood, I didn't quite get that, that was the intention at first.

Jul 14 '22 22:07 vimirage

design design copied to clipboard

[js-api] Allow JS functions to be directly added to via `table.set`?

design
design copied to clipboard