wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Function references

Open CosineP opened this issue 2 years ago • 3 comments

This patch, written by @dhil and I, implements the (complete) function references proposal, forming the wasmtime side of the existing pull request bytecodealliance/wasm-tools#701. It is missing some obvious things:

  • call_ref is simply translated as a cranelift indirect_call. A faster cranelift indirect call that does not check the callee signature can (and should) eventually be added and used. Further, no null check is elided.
  • The wasm-tools PR has advanced past the version we depend on here. We hope to update all at once, once the wasm-tools side has fully settled. This will result in the removal of a number of ValType::Bot branches, the addition of the type annotation on call_ref, and a handful of other convenient things. The version we depend on is called func-ref-2, if you need to reference it*.
  • Wasmtime, too, has of course advanced past our parent. I tried to rebase on the latest changes, but due to particular Cargo.toml changes, I think this will be much easier when we've retargetted to the latest wasm-tools#701 and can keep our dependencies consistent in the history

Otherwise, things should be mostly obvious. Typed function references are represented by the existing untyped function references. Even though it's held up by wasm-tools, we ended up with a fairly polished changeset that we figured we could send out and get feedback on. There's no rush on our end, as we've been happily using this in our typed continuations work for some time now. But we're happy to discuss any changes, problems, or suggestions!

* We added a public peek method to the validator to attain a signature for call_ref. This is now unnecesary because call_ref comes with a type annotation, and will disappear fairly conveniently.

CosineP avatar Nov 17 '22 04:11 CosineP

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

github-actions[bot] avatar Nov 17 '22 04:11 github-actions[bot]

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to complete this check list:

  • [ ] If you added a new Config method, you wrote extensive documentation for it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • [ ] If you added a new Config method, or modified an existing one, you ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance slot inside the pooling allocator, you should ensure that at least one of our fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this configuration option in wasmtime_fuzzing::Config (or one of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this configuration. See our docs on fuzzing for more details.

  • [ ] If you are enabling a configuration option by default, make sure that it has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the .github/label-messager.json configuration file.

Learn more.

github-actions[bot] avatar Nov 17 '22 04:11 github-actions[bot]

call_ref is simply translated as a cranelift indirect_call. A faster cranelift indirect call that does not check the callee signature can (and should) eventually be added and used. Further, no null check is elided.

call_indirect doesn't check the function signature, nor does it check for null pointers.

bjorn3 avatar Nov 17 '22 09:11 bjorn3

I'm interested in resuming work on this patch and dedicating the necessary time to bring it into an acceptable state. I have a patch which merges this PR with the recent changes to main (c.f. effect-handlers/wasmtime#13). I will merge that patch into this PR later today.

I think the current status is as follows:

  • Compatible with latest wasm-tools
  • All new function reference instructions have been implemented (save for return_call_ref) (tested on x86_64).
  • 10 test case failures:
failures:
    wast::Cranelift::spec::function_references::br_table
    wast::Cranelift::spec::function_references::br_table_pooling
    wast::Cranelift::spec::function_references::linking
    wast::Cranelift::spec::function_references::linking_pooling
    wast::Cranelift::spec::function_references::return_call_ref
    wast::Cranelift::spec::function_references::return_call_ref_pooling
    wast::Cranelift::spec::function_references::table
    wast::Cranelift::spec::function_references::table_pooling
    wast::Cranelift::spec::function_references::type_equivalence
    wast::Cranelift::spec::function_references::type_equivalence_pooling

test result: FAILED. 872 passed; 10 failed; 10 ignored; 0 measured; 0 filtered out; finished in 77.91s

Here return_call_ref is expected to fail as we have not implemented the tail calls proposal. I haven't worked out why linking and type_equivalence fail yet. The test br_table fails because of how the wast crate of wasm-tools elaborates single element tables

(module
  (type $t (func))
  (func $tf)
  (table $t (ref null $t) (elem $tf)))

The table test fails due to the incompleteness of the type reconstruction procedure wasmtime::values::Val::ty(). The procedure has to infer the nullability of an ExternRef or FuncRef by looking at their respective shape, which is impossible.

I will probably need some help with this latter issue as it seems to require a slight redesign/refactor to keep source value type information around, at least in the wast crate of wasmtime. I will probably also need some help with diagnosing the remaining failure cases. I am also positive that there are several code quality improvements that need to be implemented in order to make this PR satisfactory.

dhil avatar Feb 23 '23 14:02 dhil

This review got auto-assigned to me but I think @fitzgen and @alexcrichton have the right context for the review here.

abrown avatar Apr 13 '23 18:04 abrown

@alexcrichton @fitzgen I have reverted the changes to the wasmtime public API. The end result is not too bad; there are a couple of more test failures now. I haven't looked into the failures in detail yet.

If you want I can reopen this PR from my personal fork such that you can get write permissions. I think GitHub do not grant write permissions on PRs originating from an organisation account.

dhil avatar Apr 19 '23 13:04 dhil

Some things I'm seeing:

  • I sent https://github.com/effect-handlers/wasmtime/pull/19 to surface the failures here on this PR
  • One test failure, wast::Cranelift::spec::function_references::linking, relates to this function which needs to be souped up for typed function references. Note though that this function will need to grow &ModuleTypes arguments to perform the subtyping check between two different modules (this is basically the same code that's in wasmparser, just at link-time instead of validation-time).
  • wast::Cranelift::spec::function_references::table looks like it may be a bug in table initialization?
  • wast::Cranelift::spec::function_references::br_table is failing due to https://github.com/bytecodealliance/wasm-tools/pull/952 I think
  • wast::Cranelift::spec::function_references::type_equivalence looks like it may be some sort of code generator bug? Unsure.

There are additionally a number of tests as you've found which all panic due to the one you inserted to avoid changes to the public API. Those tests will need to be skipped as-is from the upstream testsuite, but they can be copied to to tests/misc_testsuite with updates to run in Wasmtime.

alexcrichton avatar Apr 19 '23 17:04 alexcrichton

Excellent, thank you! I am down to two failures now type_equivalence and table.

There are additionally a number of tests as you've found which all panic due to the one you inserted to avoid changes to the public API. Those tests will need to be skipped as-is from the upstream testsuite, but they can be copied to to tests/misc_testsuite with updates to run in Wasmtime.

I will move some of the content of the ignored tests to the misc testsuite too.

dhil avatar Apr 20 '23 15:04 dhil

I have been trying to track down the remaining two failures. In the table.wast file, the problem is that tables aren't properly initialised. Looking at the code it indeed seems like table initialisers get ignored altogether (https://github.com/bytecodealliance/wasmtime/blob/main/crates/environ/src/module_environ.rs#L303-L307), which would explain why table entries always seem to be initialised to null (c.f. https://github.com/bytecodealliance/wasmtime/blob/main/crates/runtime/src/instance.rs#L872-L874). From reading the code, I understand that great care is being taken to ensure lazy initialisation of tables, and thus, I wonder what semantics you want for tables with initialising expressions? @alexcrichton @fitzgen

dhil avatar May 03 '23 10:05 dhil

Looking at the code it indeed seems like table initialisers get ignored altogether

Indeed! Table intializers were added with the function-references proposal so that's why they're "ignored". When we updated wasmparser we probably should have made it an error if a non-null initializer was indicated.

I think what you'll want to do is to update TableInitialization to store the default initializer and then update try_func_table_init to not use FuncTable if there's a non-null initializer. We can probably lazy-initialize with initializers but that's fine to to in a future PR. It's probably best to get the base set of functionality working first.

alexcrichton avatar May 03 '23 18:05 alexcrichton

Commit 5de9a246fc0c8a0b9ae7a4a60b2ce74cceba86ce implements a fix for the table initialisation problem, though, it is ended up a bit more roundabout than I initially anticipated. If I understand correctly, TableInitialization is global per module, so to avoid penalising pre-function-references code I introduced a new table initialisation strategy EagerFuncTable, which eagerly initialises all table fields. I am not sure whether this aligns with what you had in mind.

As for the final failing test type-equivalence, it fails, seemingly, because the type test for call_indirect is nominal rather than structural (c.f. https://github.com/bytecodealliance/wasmtime/blob/6dc3eb717d7cf9b3539557ea7b393dd51225336d/crates/cranelift/src/func_environ.rs#L1662C1-L1665). Here is a minimal example to reproduce:

(module
  (type $s0 (func (param i32)))
  (type $s1 (func (param i32 (ref $s0))))
  (type $t1 (func (param (ref $s1))))

  (func $s1 (type $s1))
  (table funcref (elem $s1))

  (func (export "run")
    (call_indirect (type $t1) (ref.func $s1) (i32.const 0))
  )
)
(assert_return (invoke "run"))

I am not quite sure what the best course of action is here. I would have expected type/type signature canonicalisation to solve this problem.

dhil avatar May 16 '23 09:05 dhil

which eagerly initialises all table fields. I am not sure whether this aligns with what you had in mind.

Seems reasonable to me! One adjustment I might recommend though is to reuse the existing TableInitialization::Segments initializer. That's the "eager" mode which is optimized, if possible, to TableInitialization::FuncTable which is lazy initialization. During that optimization you can add a check that if anything is not null-initialized then the optimization can be skipped. I think this'll all have the same effect as what you've implemented but avoids EagerTableInitializer and Segments being sort of duplicates of each other.

I'll look more into the call_indirect issue now.

alexcrichton avatar May 16 '23 15:05 alexcrichton

Ok so I can definitely imagine that canonicalization has a bug in it and something isn't working as expected. My guess is that something is hashing a module-local u32 when it actually needs to hash something module-independent, or something like that.

That being said I'm not sure that your reduction showcases the issue? You're invoking, indirectly, a function that takes two parameters with only one parameter. In that sense I think a runtime error is expected there?

alexcrichton avatar May 16 '23 15:05 alexcrichton

Ok so I can definitely imagine that canonicalization has a bug in it and something isn't working as expected. My guess is that something is hashing a module-local u32 when it actually needs to hash something module-independent, or something like that.

That being said I'm not sure that your reduction showcases the issue? You're invoking, indirectly, a function that takes two parameters with only one parameter. In that sense I think a runtime error is expected there?

Ah yes, sorry. I deleted a little bit too much. Here is a (valid) repro:

(module
  (type $s0 (func (param i32)))
  (type $s1 (func (param i32 (ref $s0))))
  (type $s2 (func (param i32 (ref $s0))))
  (type $t1 (func (param (ref $s1))))
  (type $t2 (func (param (ref $s2))))

  (func $s1 (type $s1))
  (func $f2 (type $t2))
  (table funcref (elem $f2 $s1))

  (func (export "run")
    (call_indirect (type $t1) (ref.func $s1) (i32.const 0))
  )
)
(assert_return (invoke "run"))

It is extracted from type-equivalence.wast, the ;; Indirect types section.

dhil avatar May 17 '23 13:05 dhil

which eagerly initialises all table fields. I am not sure whether this aligns with what you had in mind.

Seems reasonable to me! One adjustment I might recommend though is to reuse the existing TableInitialization::Segments initializer. That's the "eager" mode which is optimized, if possible, to TableInitialization::FuncTable which is lazy initialization. During that optimization you can add a check that if anything is not null-initialized then the optimization can be skipped. I think this'll all have the same effect as what you've implemented but avoids EagerTableInitializer and Segments being sort of duplicates of each other.

I'll look more into the call_indirect issue now.

I have attempted to clean this up in the latest commit. Is this closer to what you had in mind?

dhil avatar May 17 '23 13:05 dhil

Ok so I think the issue is in two locations:

  • Here in the engine-level signature registry, which is a hash map of types across modules registered in an engine.
  • Here where there's a module-level map of types-to-indexes which should be deduplicating the above types but isn't.

This bug is due to derive(Hash) on WasmFuncType which delegates all the way through to WasmHeapType which hashes the raw index. I think the best solution would be to remove the auto-derived trait impls from WasmType about equality and hashing, and then work backwards from there. The raw WasmHeapType::Index should probably take a SignatureIndex to assist in deduplicating within a module. That should solve the second bullet above.

The first bullet above, however, is much more difficult. That'll need to perform a "deep hash" of the entire structural type since it must be module-independent (and SignatureIndex is module-local). If the second bullet is enough to get tests passing for this PR though then that seems ok to defer to an issue and get this landed instead.

(to clarify, at this point I think it's fine to land this PR so long as it passes CI, it's been long enough and I'm sure y'all are tired of rebasing. It's ok to defer work so long as there's issues on file and the work is gated, which it all is)

alexcrichton avatar May 17 '23 14:05 alexcrichton

Thanks! I see the culprit now. Just to clarify, when you say

The raw WasmHeapType::Index should probably take a SignatureIndex to assist in deduplicating within a module.

Do you mean that WasmHeapType should be parameterised by a SignatureIndex rather than u32? I can see how that would make the interning easier, though, how does this affect conversion to/from wasmparser types?

dhil avatar May 19 '23 10:05 dhil

No need to worry about conversion into a wasmparser type, that in theory shouldn't happen (or at least not that I know of). For conversion from a wasmparser type it may mean that a From impl needs to be removed in favor of a method on ModuleTypes or something like that where the indexing/hashing context is available.

So thinking more on this, sort of what this wants is WasmType<T> where a Module store WasmType<SignatureIndex> but an Engine stores a WasmType<VMSharedSignatureIndex>. I think then the derive(Hash) should continue to work with PartialEq as well?

alexcrichton avatar May 19 '23 15:05 alexcrichton

I am OK with trying this approach, but it is a sizeable refactoring I think as WasmType<T> implies a type bound on Table, Global, and EntityType (I suppose if they are only module-local then they can instantiate T = SignatureIndex internally).

The From trait will need to be removed, or at least supported by an alternative to support conversion from typed references.

Should I give this refactoring a go?

dhil avatar May 22 '23 11:05 dhil

This needs to happen for typed continuations support as well. Currently we assume everything is a function, but if we do this refactor I think we can look up in the types table whether a ref is a function or a continuation, which we need to know in a number of places in Wasmtime (at least in the embedding API; we managed to fudge the table layout stuff). But as a result I can say firsthand it is absolutely a hefty undertaking which is why we resorted to our current hack. I may still have a local branch with part of this refactoring underway.

CosineP avatar May 22 '23 15:05 CosineP

If you're ok with it, I think it's ok to set the test to ignore and land this. It's true that this refactoring will require a bit of finesse, but it's one I'm happy to take on myself rather than having y'all do yet-more work on top of what you've already got here.

Otherwise I'd be happy to merge this with a green CI run (which will require ignoring tests) and a list of items to tackle tracked in an issue (such as fixing the failing tests, this possible refactoring, the embedder API, etc)

alexcrichton avatar May 22 '23 18:05 alexcrichton

I have added the test type-equivalence.wast to the ignore set. How should we go about documenting the issues? Should I open each issue as a separate issue here on GitHub?

dhil avatar May 23 '23 12:05 dhil

I think one tracking issue should be good to start off with and it can branch from there as necessary. I'll work on giving this a final pass today. Thanks again for y'all's work here!

alexcrichton avatar May 23 '23 13:05 alexcrichton

@dhil to answer this question (over here for a bit more visibility) my thinking is that at the wasm AST layer there's only "this index" as a type but at the Wasmtime layer we'll be able to be more principled than that by having, for example:

enum WasmHeapType {
    Func,
    Extern,
    TypedFunc(SignatureIndex),
    Array,
    TypedArray(StorageType),
    Struct,
    TypedStruct(StructIndex),
    // ..
}

or something along the lines of that where WasmHeapType stores effectively the discriminant of what it refers to rather than requiring the indexing operation to then verify the type of the result.

One of the things I was worried about was that there was the pervasive assumption that WasmHeapType::Index(_) was a function which won't be true once the GC proposal was implemented. I wanted to head off issues there by having the Wasmtime-level name be called TypedFunc so it's clear that all the cases handling Func and TypedFunc are only there for handling function-related things. In the future when more variants are added it'll require adding more match arms to handle in all these places.

alexcrichton avatar May 24 '23 14:05 alexcrichton

I've opened a issue to track the unresolved bits of this PR. See #6455. I hopefully I managed to cover them all.

dhil avatar May 25 '23 11:05 dhil

Seems like there are some conflicts and this needs a rebase/merge.

fitzgen avatar May 25 '23 19:05 fitzgen

Seems like there are some conflicts and this needs a rebase/merge.

I have synchronised with upstream again. Hopefully everything ticks green again :-)

dhil avatar May 26 '23 08:05 dhil

🎊

alexcrichton avatar May 26 '23 16:05 alexcrichton