interface-types Should snowman-bindings check unions?

Should snowman-bindings check unions?

Open PoignardAzur opened this issue 4 years ago • 5 comments

In the snowman-bindings presentation at the June meeting, one of the proposed binding is for enum/union/sum types.

I'm wondering, should these types be statically guaranteed to be valid?

For instance, if a function takes a bool as a parameter, where bool is defined as an enum of 0 or 1, can that function safely assume that the parameter will always be either 0 or 1, even when being called by malicious code?

If so, how do you think this guarantee could be enforced?

Jul 21 '19 16:07 PoignardAzur

I think a ☃-binding type should precisely define a (possibly infinite) set of valid values and the semantics/implementation would guarantee that only one of those valid values was actually passed. Thus, if we have a bool ☃-binding type, it can only take a true or false, and it's up to the binding expressions to map to and from these two boolean values to (presumably) a wasm i32. On the producing end, a i32-to-bool binding operator would presumably map all non-zero values to true. On the consuming end, a bool-to-i32 binding operator would presumably map to either 0 or 1.

Jul 24 '19 20:07 lukewagner

Yes. I think that this is right for booleans. There are other cases to consider: optional types, promises, days of the week, error codes...

Jul 24 '19 21:07 fgmccabe

For instance, if a function takes a bool as a parameter, where bool is defined as an enum of 0 or 1, can that function safely assume that the parameter will always be either 0 or 1, even when being called by malicious code?

Yes. For enum types, e.g. bool, the binding generated by the host needs to either assert that the input matches, or coerce the value. So the caller could have a i32-to-enum-assert coercion that would say assert(val >= 0); assert(val <= 1);. A different caller could have a i32-to-enum-clamp coercion that would say val = min(max(val, 0), 1); In either case, the callee will only see 0s or 1s.

For more general ADTs, such as a Either a b = Left a | Right b, that could desugar to a C++-style:

enum EitherTag { Left, Right };
template <typename A, typename B>
struct Either {
  EitherTag tag;
  union {
    A leftValue;
    B rightValue;
  };
};

and thus use a similar i32-to-enum binding strategy on the tag field. The ADT binding operator would need to do some handwavium (specifically, also read the type tag) to determine whether to use a (user-defined) A-to-ref or B-to-obj binding, but in principle it's the same logic.

Note: I'm not saying we need two i32-to-enum bindings, but there are (at least) two strategies that preserve the guarantee on the callee's side that the values are valid, enforceable by the host inserting dynamic checks.

Jul 25 '19 00:07 jgravelle-google

For more general ADTs, such as a Either a b = Left a | Right b, that could desugar to a C++-style:
enum EitherTag { Left, Right };
template <typename A, typename B>
struct Either {
  EitherTag tag;
  union {
    A leftValue;
    B rightValue;
  };
};

I don't disagree with you, but I wonder, if a function passes a range of enums, does the host check all of them? That might be an undesirable hidden cost. Similarly, in the example above, does the host check the validity of A or B (depending on tag)?

I wonder if you couldn't directly declare the variables as enums on the stack, so most of the checks can be performed at compile-time instead of run-time.

Note: I'm not saying we need two i32-to-enum bindings, but there are (at least) two strategies that preserve the guarantee on the callee's side that the values are valid, enforceable by the host inserting dynamic checks.

I vote for trap all the time. It's simpler that way, and if the compiler wants the other strategy, it can always perform the modulo itself; the host may or may not elide the runtime check.

Jul 25 '19 09:07 PoignardAzur

On Thu, Jul 25, 2019 at 2:52 AM Olivier FAURE [email protected] wrote:

For more general ADTs, such as a Either a b = Left a | Right b, that could desugar to a C++-style:

enum EitherTag { Left, Right };template <typename A, typename B>struct Either { EitherTag tag; union { A leftValue; B rightValue; }; };

I don't disagree with you, but I wonder, if a function passes a range of enums, does the host check all of them? That might be an undesirable hidden cost. Similarly, in the example above, does the host check the validity of A or B (depending on tag)?

a. The 'host' may generate code to handle enums. However, this is not webIDL :) Remember that the host only 'generates' code for pairs of coercions; so, if such a pair can be implemented in a pass-through way that is what will happen. b. When you have discriminated unions like the Either type, there has to be a way for the client (provider) to signal to the host which of the unions is active. There will likely be a coercion operator that allows the client (provider) to do that. In general, this will not be done by the host inspecting the value in some quasi magical way.

I wonder if you couldn't directly declare the variables as enums on the stack, so most of the checks can be performed at compile-time instead of run-time.

Note: I'm not saying we need two i32-to-enum bindings, but there are (at least) two strategies that preserve the guarantee on the callee's side that the values are valid, enforceable by the host inserting dynamic checks.

It is not necessary to declare the enums. In fact, I would not recommend a dynamic approach: the coercion operator has to be able to deterministically determine how to map the enum.

I vote for trap all the time. It's simpler that way, and if the compiler wants the other strategy, it can always perform the modulo itself; the host may or may not elide the runtime check.

This may be an MVP/later issue. Depends on how quickly we proceed cf the exceptions proposal.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/50?email_source=notifications&email_token=AAQAXUDFEVJNEC3BRIDJC4TQBFZVTA5CNFSM4IFSZRVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y7H6A#issuecomment-514978808, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUG62EFESGSUVUDVXNDQBFZVTANCNFSM4IFSZRVA .

-- Francis McCabe SWE

Jul 25 '19 15:07 fgmccabe

interface-types interface-types copied to clipboard

Should snowman-bindings check unions?

interface-types
interface-types copied to clipboard