wasm-c-api Add safety checks to several C API functions.

This patch illustrates what safe interfaces for wasm_func_call, wasm_global_set, wasm_instance_new, wasm_table_new, wasm_table_grow, wasm_global_new, wasm_table_get, wasm_table_set might look like.

This includes 10 new checks, including array length checking and type checking. All the checks added are done in terms of existing API functions, so this represents no major new burden on implementations.

This is a work in progress. More safety checks are possible beyond this.

This patch doesn't include checking for references crossing store boundaries. It's a topic worth discussing, but I've omitted it in order to focus the discussion around this patch.

Many of the details here will likely change as we iterate on this patch, including the way errors are represented, naming conventions, coding style, documentation conventions, or the way unchecked interfaces are exposed.

Wasm validation errors are of course distinct from runtime errors in the wasm spec, and it's desirable to let API users distinguish between the two. To keep things simple for now, I've just added a distinct string in the trap message that users could test for. This can also be changed.

The checking could be more efficient. In particular, all allocations could be avoided. I haven't done that yet, in order to demonstrate that everything here is covered by existing API predicates.

This patch also converts wasm_table_set's way of reporting out-of-bounds errors from a bool to an error reported through wasm_trap_t. This more closely reflects the actual wasm table.set semantics. Also, it allows implementations to provide a more informative error message.

Apr 06 '20 16:04 sunfishcode

Addressing the feedback from the meeting today, I've now added a mechanism by which C source code can interpret the safety checks as assertions, with a mechanism for enabling and disabling them at compile time with the usual NDEBUG mechanism.

I've also written a new comment explaining the reason for choosing to use wasm_trap_t to report these. Of course, the new checks here are things which the wasm spec classifies as validation errors, and not traps. However, as an example, the wasm spec treats call signature mismatch as either a validation error or a runtime trap, depending on whether it's a call or call_indirect. Traps and validation errors are conceptually similar, and only differ in when the check happens. Consequently, traps feel like the right thing for reporting these errors. And, this is consistent with the WebAssembly JS API as well.

At the moment, the only way for users to distinguish such a trap is to examine the trap message and see if it starts with the string "invariant violation". That's not great, and I'm interested in finding a better approach.

Apr 29 '20 00:04 sunfishcode

Okay, I think it will help to structure the discussion about this PR a bit. There are parts that are fine, parts that I hope we can find a better design for, but also parts which are highly problematic (the biggest problem being that the design does not consider the impact of/on future Wasm features and implementations).

Checking argument vector sizes

Generally, I still agree with @jakobkummerow that the extra vector size mostly provides false safety (which is why it was removed in #48). On the other hand, it’s cheap and you made a fair point that mechanically generated code could benefit. So I’m fine with putting that back in.

However, for consistency with all other places in the API, please don’t pass the size as a separate parameter, but package it up as a *_vec_t. (That would be a partial revert the simplifications from #48, minus the result_t type.)

Error reporting mechanism

This PR is concerned with host-side usage errors: bugs in the host or missing checks around the use of untrusted Wasm modules. That is a different domain than errors stemming from Wasm execution. FWIW, the JS API clearly distinguishes between WebAssembly.RuntimeError and usage errors. For example, Table#set throws RangeError for out of bounds.

Usage errors also do not have a meaningful way of including a stack trace. Consequently, there should be no need for allocating such errors in a store and for polluting the API with spurious extra store parameters (simple message string are not Wasm language objects, so not tied to a store).

Considering that, a separate wasm_error_t type would be more adequate. If we want to report API errors with messages, then it could just be an alias for wasm_message_t. That actually is simpler to implement than the use of traps as currently in the PR. (Having said that, it is not clear to me how useful error messages actually are in the average case of API usage errors, other than e.g. for module instantiation/validation, which this PR does not add. But I don’t mind.)

Multiple table constructors per reference type

The PR introduces whole families of table functions, one set of accessor for each existing reference type. I don’t understand the purpose of this, since the reference type in question is already determined by the table or table_type_t given to the constructor (and accessible via wasm_table_type). So you could get the same information from there.

In any case, this approach clearly does not scale, since the typed reference proposal will soon introduce an infinite number of reference types (ref $T) into the language.

Is there any other reason for introducing these functions? It seems that they could just be removed without any loss.

Configuring checks

Thanks for adding the NDEBUG flag, this is a useful start. However, the PR now has tripled the number of many API functions, each coming in plain, unchecked, and “emboldened” variant. That doesn’t strike me as either elegant or necessary.

What is the advantage over applying the flag to the plain form directly?

Checking the types of assigned values

Now for something that is very problematic, and I don’t think there has been a serious effort of considering the downstream implications.

Underlying the idea of type checking assignments is a very strong, hidden assumption: that every engine, for all future versions of Wasm, implements a full type reflection capability on every value, especially references!

While such an overhead is accepted practice in runtimes for dynamic languages, static languages typically want to avoid such cost. Wasm should efficiently support both. That means that runtime type information stored in reference values (beyond what a built-in GC would need) has to be explicit and optional. In particular, the GC proposal carefully supports both castable and non-castable reference objects, the latter avoiding the overhead of unnecessary runtime type information.

That doesn’t mean that some engines can’t do something more costly, e.g., those embedded in JS, if they were keen to give full access to everything. But it’s not a feasible requirement to impose that cost on all engines and for all use cases.

The cost of enabling type reflection is a global space and time overhead for all heap objects. As such, it cannot be circumvented by merely adding unchecked functions. Opting-out would require a different API mode that corresponds to a different (static) engine configuration. And even with such a static choice it better be an optional feature of the API, since not all engines will want to go to length of implementing it (which will generally require a different heap representation).

Put differently, in future versions of Wasm, such checks can no longer be expressed in terms of API functionality. The best you can check for is the kind of value, per the val_t union. But that won’t be enough to ensure type safety in general.

Also note that a typed host language binding to this API could potentially ensure these constraints statically, without any runtime cost. So the claim that this helps all safe bindings is not quite accurate.

Other considerations

There has been a fair amount of discussion about adding e.g. a high performance call API that does not require going through API wrapper functions but can call directly into Wasm functions (some pieces of that discussion are still captured in #68). This is clearly something people want in the future. But it cannot be made safe.

That indicates that there is some fundamental tension about the role of this API. The original intent was a low-level API, deliberately designed to be thin and low-overhead, allowing higher layers to provide safety according to their needs. I still believe having such a low-level API is useful and important as a foundation for other, more elaborate things.

The use case of having a more sandboxed API is a useful one as well. But that doesn’t eliminate the need for the other! We want both, because they serve different use cases.

So stepping back from this specific PR, I wonder why we can’t take a more layered approach, where we define and implement a low-level base API, as was intended with this proposal, and define more checking on top -- instead of introducing an awkward mix of both kinds of API. AFAICS, that would do a cleaner job of addressing both use cases, and clients can pick the layer that suites their needs.

May 05 '20 06:05 rossberg

Oops, sorry, wrong button.

May 05 '20 06:05 rossberg

Checking argument vector sizes

However, for consistency with all other places in the API, please don’t pass the size as a separate parameter, but package it up as a *_vec_t. (That would be a partial revert the simplifications from #48, minus the result_t type.)

Done.

Error reporting mechanism

Considering that, a separate wasm_error_t type would be more adequate. If we want to report API errors with messages, then it could just be an alias for wasm_message_t. That actually is simpler to implement than the use of traps as currently in the PR. (Having said that, it is not clear to me how useful error messages actually are in the average case of API usage errors, other than e.g. for module instantiation/validation, which this PR does not add. But I don’t mind.)

I actually started with a wasm_error_t, but found it added a lot of clutter (more out params for users to set up, and more things to test afterwards) without seeming to provide a clear benefit. From a user perspective, validation errors and runtime traps both mean that attempts were made to violate wasm invariants; the only difference is the time that they're diagnosed. However, it is useful to be able to distinguish between different types of traps, so I've now added a mechanism to do that.

Multiple table constructors per reference type

The PR introduces whole families of table functions, one set of accessor for each existing reference type. I don’t understand the purpose of this, since the reference type in question is already determined by the table or table_type_t given to the constructor (and accessible via wasm_table_type). So you could get the same information from there.

The main change is to switch wasm_table_get/wasm_table_set to use wasm_val_t, making them more consistent with other parts of the API, providing more flexibility for the API to evolve (see below for one example), and making the API more convenient for users that hold all values in wasm_val_t.

But it is less conveninent for other users, so I added convenience functions for some cases where a static type is known. I expect this is a pattern we could follow to add convenience/optimized-versions of wasm_global_get etc. that don't go through wasm_val_t too, if we wanted to have a more low-level API.

In any case, this approach clearly does not scale, since the typed reference proposal will soon introduce an infinite number of reference types (ref $T) into the language.

The wasm_val_t version handles the general case. We'd only add statically-specialized convenience functions as needed.

Is there any other reason for introducing these functions? It seems that they could just be removed without any loss.

I agree. But the API does include some non-essential convenience functions already, so adding a few more for cases that come up in the examples didn't seem out of place.

Configuring checks

Thanks for adding the NDEBUG flag, this is a useful start. However, the PR now has tripled the number of many API functions, each coming in plain, unchecked, and “emboldened” variant. That doesn’t strike me as either elegant or necessary.

What is the advantage over applying the flag to the plain form directly?

Could you be more specific about what you're proposing here?

For the use case of binding to other languages, it's difficult to bind to things which are influenced by C preprocessor flags. We want to bind to regular C functions. And for these use cases, the checks don't serve as "asserts", so there's no need for a way to disable them.

It also turns out that the "emboldened" forms have a different signature from both the checked and unchecked forms. Checked needs extra arguments and return values, while "emboldened" just needs extra arguments. Adding functions with error return values that users are expected to ignore in some configurations sounds risky to me. I agree that there's a fair amount of redundancy here, but it seems the less risky approach.

Checking the types of assigned values

Underlying the idea of type checking assignments is a very strong, hidden assumption: that every engine, for all future versions of Wasm, implements a full type reflection capability on every value, especially references!

I see at least two possible paths forward.

One is to add runtime type information to wasm_val_t. (If we're clever, we might even do this without changing its size.) And if we're consistent about using wasm_val_t in the API, such as using it in wasm_table_set etc. as PR does, this seems to solve the problem. There are more details to work out, and I don't have a complete design yet, but I'm not aware of any fundamental blockers here.

However, even if that doesn't work, another viable approach here is just to say that some functionality only exists in unchecked forms, and has no direct checked counterpart. That would be unfortunate, but wouldn't invalidate some of the other goals we have. And it's likely we'll already need to say that for at least wasm_memory_data anyway.

Other considerations

There has been a fair amount of discussion about adding e.g. a high performance call API that does not require going through API wrapper functions but can call directly into Wasm functions (some pieces of that discussion are still captured in #68). This is clearly something people want in the future. But it cannot be made safe.

Sure. These are unsafe escape hatches.

So stepping back from this specific PR, I wonder why we can’t take a more layered approach, where we define and implement a low-level base API, as was intended with this proposal, and define more checking on top -- instead of introducing an awkward mix of both kinds of API. AFAICS, that would do a cleaner job of addressing both use cases, and clients can pick the layer that suites their needs.

I am proposing the official WebAssembly C API should aim for "safe defaults, with unsafe escape hatches". A separate API or optional layer would risk differences between which APIs or layers implementors chose to provide, and confusion among users as to which APIs to use. A situation where users end up needing to depend on a separate library to provide checked wrappers is not "safe defaults".

May 07 '20 05:05 sunfishcode

Is there any update here? I know there's a lot going on; I'm mainly just trying to get a sense of what to expect here.

May 27 '20 16:05 sunfishcode

I'm sorry, I've fallen behind severely on my GH backlog. Will try to catch up on non-quick replies over the course of next week.

May 28 '20 07:05 rossberg

Friendly ping!

Jun 11 '20 23:06 sunfishcode

Checking argument vector sizes

Done.

Thanks. However, this is passing structs by-value now, avoiding which is an explicit requirement of the C API (since FFIs tend to not like that). So this will need to be passed by pointer, cf. functype_new and others.

Moreover, it seems like you have only modified the C API and not pushed the change through to the C++ API -- the actual check ought to be implemented there.

In fact, I only now realise that none of the changes in this PR have been implemented for the C++ API. Of course, both should remain consistent, so I'm afraid there is a bit more work to do here.

I started work on doing this properly, see below.

Error reporting mechanism

I actually started with a wasm_error_t, but found it added a lot of clutter (more out params for users to set up, and more things to test afterwards) without seeming to provide a clear benefit.

Can you give an example? The only places where I can see both kinds of errors occurring would be module_instantiate and func_call (since those are the only ones that actually invoke Wasm).

From a user perspective, validation errors and runtime traps both mean that attempts were made to violate wasm invariants; the only difference is the time that they're diagnosed.

I have to disagree. They are completely different classes of errors: one is a regular, specified outcome of executing Wasm code and returns with a corresponding stack trace; the other is a plain bug in the embedder, unrelated to the execution of Wasm code. Like with the JS API, it seems advisable not to conflate these.

Multiple table constructors per reference type

The main change is to switch wasm_table_get/wasm_table_set to use wasm_val_t, making them more consistent with other parts of the API

That is not consistent with the language, however. Tables store references, not values, and that should naturally be reflected in the API.

But the API does include some non-essential convenience functions already, so adding a few more for cases that come up in the examples didn't seem out of place.

But in the case of these functions, what convenience do they add?

Configuring checks

Could you be more specific about what you're proposing here?

I propose that there is a flag that allows disabling all unnecessary checks in the regular API functions, i.e., they simply do not diagnose the errors and would produce undefined behaviour instead, as common in C libraries.

Moreover, that likely has to be a compile-time flag, since some of the diagnosis you envision will require a more expensive data representation, the overhead of which some clients may want to avoid. For that, it is not enough to just bypass some local checks.

Having the ability to switch the regular API from "checked build" to "production build" follows common practice and is more realistic than expecting clients to write "emboldened" everywhere. (Though I'd still prefer a layered approach, see below).

I agree that there's a fair amount of redundancy here, but it seems the less risky approach.

Well, when somebody turns off checking then they ought to know what they are doing. Either we trust them to be able to mitigate their risk or we don't.

Checking the types of assigned values

One is to add runtime type information to wasm_val_t. (If we're clever, we might even do this without changing its size.)

Given that the space of types is infinite, I'm not sure how you intend to do that. Once we have typed references, significant overhead seems unavoidable to achieve that.

Furthermore, as mentioned above, val_t is the wrong type to attach these types to anyway, since not all relevant contexts operate on val_t (nor should they!).

Layering APIs

I am proposing the official WebAssembly C API should aim for "safe defaults, with unsafe escape hatches". A separate API or optional layer would risk differences between which APIs or layers implementors chose to provide, and confusion among users as to which APIs to use.

That may be so. OTOH, trying to shoehorn safety into a low-level API is likely to produce various contortions. The idea of abusing val_t to replace ref_t just for the purpose of hooking in extra checks is an example of that. We would be losing precision and static checking and replace it with an an extra layer of dynamic typing. That may be the right thing to do in some environments (e.g., untyped languages), but shouldn't be forced on everybody by the core API.

Making Progress

Other than open design questions, there's a lot of work to be done to get this PR into shape, esp regarding the C++ part. At the same time, this is a rather large PR making several largely unrelated changes at once. It will be much easier to make progress by splitting it up.

In order to get moving, I created #145, which pushes through all necessary changes to properly implement the use of vectors for calls and imports.

Jun 15 '20 17:06 rossberg