candid Extensible variants

When we designed the solution for extensible records, with special rules for opt so that they can be used in record fields, we thought we had also solved it dually for variants, as long as the variant itself is wrapped in opt.

And that was certainly true in the iteration that did a dynamic value-based check: if the tag was known, decoding would work; if it was a new, unknown tag, it would fail and turn to null.

But later (for good reason) we switched to a type-based check, decoding as null any value if the type isn't a subtype of the expected type. But in the record extension case, the extended variant isn't a subtype of the expected type, so now all values map to null. Which is not what we wanted, I believe.

This just came up in potential practice in https://forum.dfinity.org/t/enable-canisters-to-hold-icp/6153/167?u=nomeata

Not sure what to do about it, though. Besides maybe some variant-specific solution, e.g. some way to explicitly or implicitly declare a nullary tag as the default.

Nov 10 '21 23:11 nomeata

Oops. The dreaded subtype check again. :(

Nov 11 '21 07:11 rossberg

One solution that might work, and even be simpler than what we have right now, is a middle ground between the previous solution (backtracking in opt with untyped values) and the current solution (static subtype checks with opt):

We go back to backtracking decoding with opt, but we decode typed values, and do a subtype check only when actually encountering a reference.

Examples:

?(vec {1}) : ?(vec int) decodes as null at ?(vec bool), as in all the previous variants.
?(vec {}) : ?(vec int) decodes as ?vec{} at ?(vec bool), as it did in the backtracking variant, but not in the static subtype variant.
?(func A.f) : ?(func () ->(Int)) decodes as ?(func A.f) at ?(func () -> ()), of course.
?(func A.f) : ?(func () ->()) decodes as null at ?(func () -> (int)), as in the previous variants
?(variant {foo} : ?variant {foo;bar} decodes as ?(variant {foo}) at ?(variant {foo;baz}), as it did in the backtracking variant, but not in the static type variant. This is the variant extension that we lost.

Benefits and notes:

We fix this issue
This is, in a way, simpler to implement. I can't help but notice that we had the previous variant implemented in Motoko, but not yet the current, and I think it is because of a substantial increase in complexity. And it's not the subtype check itself, but rather that the subtype check has, as output, a coercion function. And that has to deal with the value representation of the host language, so it cannot be an easily isolated building block.

In the this proposal, the subtype check would be used as a simple predicate: When decoding a reference, run the subtype check, get a boolean out, and act on it.

The subtype check can easily be isolated into a black box that only knows about Candid. It takes two Candid types as input, each likely represented as type table an an index, and return yay or nay. This code can easily be shared between different host languages implementations (assuming some kind of ffi).

And then it becomes relatively easy to implement in Motoko now: The given type table and index are available; the expected can be statically generated, and the above check we can maybe even taken from the Rust candid library.
The results from the subtype check ought to be memoised probably, in case we get a whole vec of such references.
We should probaly to prove this to be correct in Coq first.
Side remark: The soundness problem we have been trying to fix here only hits in relatively obscure cases, and even when it does, the problem it causes (a call fails) are not that catastrophic on a platform where calls can fail anyways (out of cycles, other canister got deleted etc.). I wonder if we can really justify the complexity on implementations. We will find out when more indepenent Candid implementations pop up. Maybe passing the official test suite is the bigger driving factor for these people to do it right :-)

Nov 21 '21 16:11 nomeata

Hm, yes, this sounds plausible and attractive to me. Just one question for clarification, You said:

We go back to backtracking decoding with opt, but we decode typed values, and do a subtype check only when actually encountering a reference.

Don't you mean "decode untyped values" here?

That all makes sense to me, because it is closer to the original intention of the serialisation format simply describing values. Only once we optimised it by hoisting out structural information as types, we started considering stricter tests. That isn't really necessary for first-order values. However, for higher-order values (esp functions with negative type information), we handle them as if they were explicitly type-annotated and check that annotation.

Nov 22 '21 09:11 rossberg

Don't you mean "decode untyped values" here?

I mean typed, because we need the type information once we hit a reference. But for the other values, we just destruct the type as we go, but don't use it otherwise. (Previously, we could completely throw away the type after decoding but before coercion.)

Nov 22 '21 14:11 nomeata

This sounds way more attractive. Do you have a draft spec change?

Nov 24 '21 13:11 crusso

No, I shied away from writing that on the phone

Nov 24 '21 14:11 nomeata

I discovered this issue from the section on Candid variant extensibility in @roman-kashitsyn's blog post and I'm wondering if I understand things correctly.

The blog post suggests making variant fields (such as Err below) optional, so that the variant can later be extended.

type CreateEntityResult = variant {
  Ok : record { /* */ };
  Err : opt variant { /* * /}
};

My concerns with this are:

It's unintuitive.
It seems hard to do exhaustively and would be easy to accidentally miss a field, especially with type synonyms.
It probably has implications for language bindings.

My questions are:

How do I deal with vec variant { /**/ }? Do I make that vec opt variant { /**/ }?
How do I deal with record { variant { /**/ }; _ }? Do I make that record { opt variant { /**/ }; _ }?
How do I deal with record { _; variant { /**/ } }? Do I make that record { _; opt variant { /**/ } }?
What would happen if I didn't make these fields optional and I tried to extend the variants?

Given that all of my variants are named, perhaps a simpler approach is to convert type Foo = variant { to type Foo = opt variant. Does anyone have any thoughts on that?

Thanks in advance.

Aug 07 '22 22:08 paulyoung

It's unintuitive.

Yes, we would love to hear a better alternative.

It seems hard to do exhaustively and would be easy to accidentally miss a field, especially with type synonyms.

By virtual of subtyping, it's okay (or even a space optimization) to omit a variant tag if the value doesn't contain that tag.

The subtyping rule also allows to upgrade any type t into opt t, so we can always add opt at a later time. Of course, the client code has to adapt to that type.

Aug 07 '22 23:08 chenyan-dfinity

If you want a variabnt to be extensible, you have to put the whole variant in an opt. Using opt in the fields doesn't help here. This is dual to records where it's important to make the individual fields optional.

(On the phone, sorry for terseness)

Aug 08 '22 06:08 nomeata

I think the confusion is that the example on the post has two variant types, and it may not be clear which one is going to be extensible - Result or the Error values.

Aug 08 '22 06:08 nomeata