design Typed Tables

With tables being anyfunc, call_indirect must first perform a type check against the targeted function to ensure that the function's parameters and returns are valid for the scenario surrounding the call. If the parameter to call_indirect is not a constant, it would have to repeat this check at run time for every call.

If my understanding is correct, if there were separate tables for each func_type, this check could be skipped and potentially open the door to other compile-time optimizations.

Jan 07 '17 16:01 RyanLamansky

Yup, that's mentioned here.

Additionally, in the MVP, the only allowed element type of tables is a generic "anyfunc" type which simply means the element can be called but there is no static signature validation check. This could be improved by allowing:

functions with a particular signature, allowing wasm generators to use multiple homogeneously-typed function tables (instead of a single heterogeneous function table) which eliminates the implied dynamic signature check of a call to a heterogeneous table;

any other specific GC reference type, effectively allowing WebAssembly code to implement a variety of rooting API schemes.

Jan 07 '17 16:01 jfbastien

Being able to make an efficient indirect call seems a minimal requirement for publishing wasm to me. it is something that wasm must be able to do and can not be implemented in any other way. It's a bit of work to manage the signatures and multiple tables.

I recall some claims that tables with anyfun better supported efficient v-tables, but is that really the case? From my limited understanding all methods in a v-table could have the same signature anyway so could be more efficiently placed in a table with that signature and avoid the runtime check. If there is a requirement here then it would help to note it in the design rationale so that the basis for the decision is communicated.

GC references in wasm may well be bloat that is not really needed, something for people to explore if they wish but perhaps not needed as a core part of wasm and thus a distraction and thus should not be part of the core project. Perhaps a principle of wasm is, or should be, that wasm is not intended to support higher level features that can be implemented efficiently in the core anyway.

Jan 08 '17 00:01 ghost

I gave this some more thought.

Although not required by the spec, an advanced WASM-to-native compiler could consolidate duplicate Type entries (if any), and store the run-time type index alongside the function definition. Then the only check call_indirect would require is ensuring that the type index of the called function matches the type index requested by the caller, a basic int32=int32 comparison. This check can be expected to pass 100% of the time in a properly constructed program, and there may by some way to pass this hint to the CPU's branch predictor.

Typed Tables don't really add any advantage, I think. The same optimization could be done there, and a simpler native compiler would have to do a slower range check to ensure the called function falls within the table.

Am I missing something? I feel like I must be because of the fact this is already mentioned in Future Features...

Jan 11 '17 13:01 RyanLamansky

@Kardax I think having the callee do the check has been suggested before, but it requires loading the type index and passing the type index to the callee which I presume is in a register thus increasing register pressure, and then there is the need for the comparison plus some trap handling bloat too. It's not a show stopper but would be nice to avoid. Using a table specialized on the signature would allow all this overhead to be avoided.

Either way the index needs to be bounds checked, or the compiler needs to be able to derive that it is within bounds.

Jan 11 '17 14:01 ghost

On 11 January 2017 at 14:56, Ryan Lamansky [email protected] wrote:

Although not required by the spec, an advanced WASM-to-native compiler could consolidate duplicate Type entries (if any), and store the run-time type index alongside the function definition. Then the only check call_indirect would require is ensuring that the type index of the called function matches the type index requested by the caller, a basic int32=int32 comparison.

That's what implementations do already. It still has non-zero cost (extra reads, comparison, branch).

Jan 11 '17 14:01 rossberg

@rossberg-chromium How much better could it be? What call_indirect design would allow for fewer operations than read+compare+branch?

Jan 11 '17 20:01 RyanLamansky

@Kardax That savings, plus the reduction in register pressure may well be measurable. Keep in mind that some code will be making heavy use of indirect calls, for example an object orientated language indirecting via a table, or functional languages with first class functions, or calling a closure cell function object, etc. Also note that in these cases the function pointers might be wrapped in objects anyway so there is no issue with function index identity, the indexes will not be expose to be compared rather the objects in which they are located give the function identity in the language.

Jan 11 '17 21:01 ghost

@Kardax, a typed call_indirect essentially is just an indirect jump-to-subroutine.

Jan 12 '17 10:01 rossberg

@rossberg-chromium Can you expand on the "indirect" part of your explanation? I'm interested in knowing how it's a performance improvement from the read+compare+branch we have now.

If there's speed here, I'm with @wllang in that this should be part of the MVP spec, but if there isn't, I have no complaints with freezing the binary spec today. Not that my opinion has any weight with this team 🤔

Jan 12 '17 13:01 RyanLamansky

@Kardax You would need to take a look a the machine code generated to follow the discussion. So an indirect machine code call accepts a destination address in a register (as opposed to being an immediate argument). It's a performance improvement because there are less machine code instructions in the call sequence.

Jan 12 '17 14:01 ghost

@wllang I get that but how do you convert an arbitrary i32 value supplied to call_indirect to a destination address without a lookup table?

Jan 12 '17 14:01 RyanLamansky

@Kardax, it's 1 read + 1 jump as opposed to ~3 reads + 1 compare + 1 branch + 1 jump.

Jan 12 '17 14:01 rossberg