lucet icon indicating copy to clipboard operation
lucet copied to clipboard

Wasm reference types implementation planning

Open acfoltzer opened this issue 4 years ago • 0 comments

This is an issue to help plan and keep track of tasks necessary to support the Wasm reference types proposal.

External blockers

  • [x] wabt supports reftypes.
    • Blocks wasmparser's test suite, lucet-spectest, and the text format input for lucetc.
    • Parts of reftypes are there, but are outdated or not fully implemented.
  • [ ] wasmparser supports reftypes.
    • Soft blocked on wabt, which is used to interpret the spec tests; we can implement but not easily test the additions.
    • Blocks cranelift-wasm.
  • [ ] cranelift-wasm supports reftypes.
    • Blocked on wasmparser.
    • Blocks lucetc and lucet-runtime implementation.

New and changed instructions

Changes to the instruction set require updates to both wasmparser and cranelift-wasm before we can implement their semantics in lucetc and the runtime. wabt support is also needed for handling the Wasm text format, including spectests.

(✔️: implemented; ➖: partially implemented; ❌: not implemented)

Instruction wabt wasmparser cranelift-wasm Lucet
ref.null ✔️ ✔️ ✔️ ✔️
ref.is_null ✔️ ✔️ ✔️ ✔️
ref.func
table.get ✔️ ✔️
table.set ✔️ ✔️
table.size ✔️ ✔️
table.grow ✔️ ✔️
table.fill
table.init ✔️
table.copy ✔️
call_indirect ✔️ ✔️ ✔️

Multiple, mutable, growable tables

Modules may now define, import, and export multiple tables. The changes to existing instructions are mostly to add table indices as immediates, where previously the table index was implicitly zero.

  • [x] wasmparser parses modules with multiple tables.
  • [ ] wasmparser validates modules with multiple tables.
  • [x] cranelift-wasm delegates to the implementation of ModuleEnvironment for table and element declarations with the appropriate table index.
  • [ ] lucetc compiles multiple tables.
  • [ ] lucet-runtime supports multiple tables.

However, these new instructions also allow the mutation of table elements, as well as the growing of tables themselves. This is the most disruptive change from the Lucet perspective, as previously the table was just read-only data in the object file produced by lucetc.

Making tables mutable requires that we switch from per-module tables to per-instance tables, at least in some cases. This is trickier than having per-instance memories or globals, though, because we can't know at memory region creation time how many tables a particular module will need. Even if we impose limits on the total size of the tables, as we do for memory and globals, it's not clear how best to lay out the tables in that limited memory.

We need to decide on an approach before we can proceed to planning the necessary implementation components.

Static approach

We could choose to fix the maximum size of any table instantiated within a region, and the maximum number of tables per instance. This would allow us to treat tables more-or-less like we currently treat globals. The Slot and Limits types in the runtime would then look like:

pub struct Slot {
    pub start: *mut c_void,
    pub heap: *mut c_void,
    pub stack: *mut c_void,
    pub globals: *mut c_void,
    pub sigstack: *mut c_void,
    pub tables: *mut c_void, // <--- NEW
    ...
}
pub struct Limits {
    pub heap_memory_size: usize,
    pub heap_address_space_size: usize,
    pub stack_size: usize,
    pub globals_size: usize,
    pub num_tables: usize, // <--- NEW
    pub table_size: usize, // <--- NEW
}

We would then add a pointer to slot.tables to the InstanceRuntimeData, and instructions that access tables would then translate to a deref of that pointer, followed by an offset calculation:

table.get table_idx elem_idx 
≡ 
{ 
  tables_base <- load (vmctx + tables_ptr_offset); 
  load (tables_base + (table_idx * table_size) + (elt_idx * table_elem_size))
}

Advantages

  • Performance for accesses should be the same as the current uses of table zero.
  • The amount of address space used per instance remains completely predictable.
  • Memory mapping occurs for tables along with the rest of the region.
  • Compilers are unlikely to need more than two tables until the Wasm function references proposal is implemented, so imposing the region-wide num_tables limit will probably not be too painful.

Disadvantages

  • It's unclear what reasonable values for num_tables and table_size should be.
  • Quite a bit of address space might be wasted if we want a single region to simultaneously support instances with one large table, and instances with many small tables.
  • Initial values must be copied in for each instance.

Dynamic approach

In this approach, we back the tables by dynamically-allocated data structures accessible via new vmctx methods:

pub struct Instance {
    tables: Vec<Vec<Option<TableElem>>>, // <--- NEW
    ...
}

Vec might not be the best choice for the inner table representation since table elements can be sparse, but we could use a patricia trie or something if becomes an issue in practice.

The limits on table numbers and size could then be per-instance values, or even a single limit on the number of TableElems per instance.

To support this, we would have to compile table accesses into function calls to some new lucet_vmctx_-type functions for all of the table instructions and call_indirect.

Advantages

  • The table limits can be much more flexible; limiting only the total number of table elements means that both a single large table or many small tables can fit.
  • Address space impact is much closer to optimal for the number of table elements needed.

Disadvantages

  • Table operations lose performance as they now all require a function call, plus whatever additional cost is imposed by the dynamic data structure operations (e.g., vector resizing, trie lookup).
  • Allocations occurring after region creation could add stress to virtual memory systems.
  • More invasive changes to Cranelift are needed.
  • Initial values must be copied in for each instance.

Kitchen sink approach

We can take advantage of a couple key points to mitigate the downsides of these approaches by combining them and leaving some of our existing approach in place.

First, we can statically know from the Wasm at compile time which tables must be mutable, and which are only ever read. For immutable tables, we can continue using our current approach of using an immutable table in the object file, shared between all instances of the same module.

Second, we expect that the compilers which target Wasm are likely to only use the first two tables (though again, this may change with the function-references proposal). This means we could use the static approach for those tables, while using the dynamic approach for any additional tables, which we expect will be used less frequently.

Advantages

  • Immutable tables, and commonly-used tables remain fast when accessed at runtime.
  • Immutable tables avoid copying table elements in at instantiation time.
  • Wasted address space is smaller than the static-only approach in most cases.
  • The table limit flexibility of the dynamic approach is retained.

Disadvantages

  • Implementation complexity is high: we would need three different ways of handling tables.
  • Wasted address space is still likely to happen in most cases.
  • Initial values must be copied in at instantiation for each mutable table.
  • The expectation that the first two tables are most important may become invalid if compilers take advantage of typed function references.

Immutable plus dynamic approach

This is the same as the kitchen sink approach, but where the dynamic approach is used for all mutable tables, not just any tables past the first two.

Advantages

  • Immutable tables remain fast when accessed at runtime.
  • Immutable tables avoid copying table elements in at instantiation time.
  • Wasted address space is just as small as the fully dynamic approach.
  • The table limit flexibility of the dynamic approach is retained.

Disadvantages

  • Any mutability in a table causes it to take the slow path, even for the most commonly-used tables.
  • Implementation complexity is moderate: we would need two different ways of handling tables.
  • Initial values must be copied in at instantiation for each mutable table.

acfoltzer avatar Aug 28 '19 00:08 acfoltzer