Tagged types, unions, and exception handling
So in the current proposal tagged unions/variants are suggested as a possible extension type.
However something that is kind've strange is that WASM already has (shipping in Chrome stable) a tagged type in way of exceptions, e.g. we can already split into cases based on tags using try/catch as such:
(try
(...) ;; some body
catch $someTag1
(...) ;; do something based on tag of type
catch $someTag2
)
This is remarkably similar to the proposed br_on_case instruction proposed for variants.
Now another open question in the current GC proposal is whether or not to have unions of reftypes.
If we did have unions, something we could potentially do to enable both disjoint unions and arbitrary unions, is to reutilize the same behaviour try/catch already provides by having another tagged type i.e.:
deftype ::= ... | (tagged $tag)
Where $tag uses the same space for exception tags.
We could then have a br_on_tag instr that takes a tagged kind and breaks if the tag matches i.e. if we had:
;; Some tag types
(tag $tag1 (param i32) (param i32))
(tag $tag2 (param externref))
;; tagged types
(type $t1 (tagged $tag1))
(type $t2 (tagged $tag2))
;; a union type
(type $union (ref $t1 $t2))
;; create instances of tagged types
;; this is very similar to throw except
;; it just produces a GC type rather than throwing an exception
(tagged.new $tag1
(i32.const 4)
(i32.const 5)
)
;; sometime later
(get_$union_type_somehow)
;; br_on_tag branches to the given label if the value on the stack has
;; the appropriate tag, the label must match the tag type for this instruction
;; to be valid
(br_on_tag $tag1 $someLabelWithTag1Type)
Alternatively we could even have a more general match instruction (akin to try/catch):
(match $blockType
;; create tagged type somehow
case $tag1
;; [i32 i32] is on stack
case $tag2
;; [externref] is on stack
default
;; variant with unrecognized tag
;; just leaves the value on the stack
;; if not provided this could trap
Variants and tags are quite different:
- Variants are closed (statically known) and their cases are syntactic (by position/name).
- Tags are open (dynamically extensible) and generative (fresh at runtime).
These different properties make them complementary, and they are often found alongside each other. For example, functional languages use variants all over the place, but also have generative open data types such as exceptions.
It's a bit like the difference between tuples and arrays: one is fixed arity and with static indexing, the other has unknown arity with dynamic indexing.
Yes, although tag is sufficient for implementing variants as you can just map each variant member to a tag and create their union (ref $var1 $var2).
In the post MVP description for variants it mentions that the main reason to have a higher level representation is to allow VMs to pick more efficient representations. But couldn't they do this with tags + ref unions anyway? Like if a VM saw (ref $var1 $var2), the VM could just treat that like it would a variant and perform whatever optimizations it wants.
- syntactic (by position/name)
I assume you're meaning here that across two instances (or modules) the same variant will have the same RTT and hence can be called without a direct reference to the type. While I do think such a feature would be helpful for languages that want structural variants, it seems more common in languages that variants are nominal so having equal RTT would not be all that important for compiling such languages.
I don't have any opposition to supporting both, but this suggestion is mainly based on the fact that engines already need to implement tagging for exception handling, so it would be a fairly logical extension to support tagged values beyond exceptions. (Which again, would allow for many languages uses of variants to be compiled directly without needing linear memory, i.e. a large part of the reason for the gc proposal).
An alternative could come from one of the other open questions, and that's if there was a way to generate fresh RTTs then I suppose one could just shove a unique RTT in a struct to "tag" it effectively and then use down-casting:
;; use some theoretical (rtt fresh) to create unique rtts to use as "tag" types
(type $tag1 (rtt fresh))
(type $tag2 (rtt fresh))
(type $var1 (struct (field $tag1 i32 i32)))
(type $var2 (struct (field $tag2 externref i32)))
;; we can use (rtt.canon $tag1) to initialize such types, so no problems there I think
;; sometime later
(local.set $struct (get_struct_somehow))
(ref.test $var1 (local.get $struct))
(if
(then
(ref.cast $var1 (local.get $struct))
;; do as wished with struct as narrowed type
)
(else
;; otherwise check other variants or whatever else
)
)
In the post MVP description for variants it mentions that the main reason to have a higher level representation is to allow VMs to pick more efficient representations. But couldn't they do this with tags + ref unions anyway? Like if a VM saw
(ref $var1 $var2), the VM could just treat that like it would a variant and perform whatever optimizations it wants.
No, because of composition and subtyping. Unions can be arbitrarily composed, so all its elements must be uniformly recognisable independently of the union they occur in, including unions containing other sorts of values. So you cannot specialise the representation of specific unions. With variants, injection/projection is explicit, the variant is differentiated from its payload, which allows more liberty in its choice of representation.
- syntactic (by position/name)
I assume you're meaning here that across two instances (or modules) the same variant will have the same RTT and hence can be called without a direct reference to the type.
See above, for a union, we have to maintain free composability. So "position" in any given union is context-dependent and does not have any runtime relevance. Whereas for variants, we can identify cases at runtime by numeric tags based on the order in the type definition (analogous to fields in structs). And if the number of cases is small enough (e.g., binary variants are frequent), the bits for the numeric tags can be potentially be crammed somewhere where they don't take up additional space. A generative tag on the other hand will always require a full word.
Whether that makes variants worthwhile enough is a fair question, of course. Some data structures would benefit from saving a word per value.
Closing this because we won't be adding unions or variants to the MVP design, but PRs adding new ideas to the post-MVP doc would be very welcome.