gc Refactor cast opcodes?

With the addition arrayref and structref, it would make sense to refactor cast and classification opcodes to something more modular: instead of a long list of individual opcodes, use an immediate, like in br_on <heaptype> (with some suitable restrictions), and similarly ref.is <ht> and ref.as <ht>. The interpreter has long been using something similar internally.

Edit: To clarify, this is solely suggesting a change to the encoding of this group of instructions, not affecting their semantics.

Feb 09 '22 10:02 rossberg

I was actually going to suggest exactly this, as part of a heap type refactoring based on an idea that came up in yesterday's discussion of i31 as a type dimension - specifically to have unions of heap types also be heap types. I ruminated about this idea and took it a bit further, and I think it could lead to a nice regularization of the type hierarchy - and, as you suggest, improved orthogonality between the type hierarchy and the instruction set.

I'll write my thoughts here in this issue, then, instead of opening a new one.

The setup

The set of singular heap types includes a hierarchy of abstract (any, func, eq, data, array, struct) and concrete (defined functions, arrays and structs) pointer types, plus two isolated heap types - null and i31.

A heap type is a union of zero or more singular heap types. As a start, we can restrict such unions to include at most one hierarchy type. We can then generalize them later if needed.

We will have a syntax for expressing a union. There can be constructors for the combinations of null and i31, followed by a hierarchy type. When a type (hierarchy or isolated) is mentioned without a constructor, we'll need some conventions for which of the other components are included by default (like null is with the current shorthands).

Testing instructions

With such a setup, we can replace our plethora of testing instructions (ref.is_*, ref.as_*, br_on_*, at least 24 instructions in total, currently, counting only static versions) by just 5: ref.is, ref.as, ref.as_not, br_on and br_on_not.

This isn't just more orthogonal, it is also clearer and more directly expressive. Currently, the testing type of ref.cast depends on the input type. It allows null iff the input is nullable. With this setup, the test can always express the exact type combination to be tested. Also, with the current instructions, type tests have to be composed using various combinations of tests for null, i31, data and a specific defined type. Some combinations can get quite unwieldy, such as a nullable cast from eqref, as reported in https://github.com/WebAssembly/gc/issues/130#issuecomment-1032842173.

Sharpening

ref.as_not, br_on and br_on_not will perform type sharpening on the negative result (output of ref.as_not, fall-through arm of br_on, and branching arm of br_on_not). This is one of the main benefits of the union type.

It will be most robust for this sharpening to not perform "sideways" reasoning in the type hierarchy, i.e. the rule would be:

If the tested type contains null, the result will not contain null.
If the tested type contains i31, the result will not contain i31.
If the hierarchy component of the tested type is a supertype of the hierarchy component of the input type, the result will not contain a hierarchy component.
In other respects, the result type will be like the input type.

Sharpening on the positive result (intersection of the input type and the tested type) is not necessary, since the precision can be baked into the tested type by the producer. It is a simpler type rule to have the output type just be the tested type.

`eq`

null, i31 and all subtypes of eq are comparable. Thus, the input type to ref.eq is null | i31 | eq.

With i31 removed from the hierarchy, eq and data become equivalent. We may want to keep the distinction around in case we later want to add some comparable pointer type which does not have the embedded runtime type expressed by data (or func).

Definite `null`

This setup also fixes another pain point - the lack of a definite null type. We can drop the type immediate from ref.null and type its output as a definite null.

Empty

Sharpening can produce an empty union, so this needs to be part of the type formalism. Whether or not we want to have a syntax for expressing such an empty type explicitly is an open question.

Interpreter performance

A compiler (even a non-optimizing one) can readily generate a specialized check based on the combination of tested type and input type. This is more difficult for an interpreter, which may need to produce side table information during validation to avoid some redundant checks. Adding union types with i31 will require it to do so anyway, though, so this might not be a significant additional complication. It'd be interesting to hear from @titzer and other interpreter authors regarding this.

RTT

It would seem that the distinction in the instruction set originated from the distinction between testing with an RTT and testing non-RTT types (including abstract types). Even though this proposal is formulated as RTT-free, it is still compatible with adding RTT versions of the instructions, as far as I can tell. The RTT versions would not permit sharpening, but otherwise they should work identically.

Feb 09 '22 13:02 askeksa-google

Whether or not we add those restricted union constructors to the MVP (which I would be interested in benchmarking for code size and performance wins), consolidating the 5 casting instructions and giving them uniform, predictable sharpening behavior makes sense to me. Adding a null type and a new version of ref.null without a type immediate makes sense to me, too.

Feb 10 '22 17:02 tlively

I think this is a very nice way to consolidate the ideas discussed in this thread, as well as around i31 and a potential null type. One thing we should consider is type-encoding size. I measured that, in a real-world J2CL module of total size 22Mb, the ref null opcode comes up ~126.000 times. If a nullable type is now expressed as (ref (union null <heapType>)) (two more bytes than before), this will increase module size by about 1%. This is a good argument to maintain a one-byte shorthand for ref union null. Similarly, we might want to consider other shorthands, e.g. for the current anyref = (ref (union null i31 any)).

Feb 10 '22 17:02 manoskouk

I generally like reducing the br zoo down to a br_on and br_on_not for the heap types.

I didn't digest @askeksa-google's union suggestion fully yet. Is it fully orthogonal to this change? If so, maybe it should have its own issue.

As far as the interpreter consideration, before #241, the Wizard requires no sidetable entries for any GC instructions (except maybe a TODO with the RTT depth, but depth goes away with iso-recursive types). There is sufficient information in the form of heap type indices for the interpreter to find its way into module-local tables that have type information. As long as union types were required to be declared in the type section, I think this would be sufficient.

Feb 11 '22 02:02 titzer

I don't see it as orthogonal. It's the union types as heap types and the existence of a definite null type which allow us to express all the various type tests as a heap type.

As for the encoding, I imagine we will have four type modifiers, which are each a single prefix byte:

non-null non-i31
null non-i31
non-null i31
null i31

These look similar to the type constructors which have been suggested in earlier discussions. The difference is that rather than being type constructors that create value types from heap types, these are modifiers that turn heap types into other heap types.

With these modifiers, we don't actually need an explicit union syntax until we decide to lift the restriction of at most one hierarchy type.

Bare heap types without a modifier have suitable default nullability and i31ability. There seems to be general sentiment that these defaults should be the same across all types. Symmetry would dictate that the default should be either non-null non-i31 or null i31, but I would argue that compactness should have priority over symmetry here, so the default should be the most commonly used combination, which most likely would be null non-i31.

Unions without a hierarchy component are expressed via an empty heap type. On its own, like other types, it defaults to be nullable and not i31able, which makes it the null type. Having a one-byte representation of the null type is practical, as null checks are very common. With modifiers, it becomes never, i31 or null i31.

By the way, what is the reason that deftypes (positive-valued heap types) are not valid value types on their own, without a ref type constructor? The positive encoding space of value types does not seem to be used. It could save significant coding space (and possibly decoding overhead) if they were. This would also completely unify the encoding of ref types and heap types, making heap types just a subset of the value types.

Feb 11 '22 10:02 askeksa-google

It seems like there is general agreement to refactor the opcodes along these lines. I'll try to come up with a coherent design. The main question will be when to land this, since it obviously is a breaking change that we might want to synchronise.

@askeksa-google, I haven't fully digested your suggestion yet either. There are some good ideas that resonate with me, but also a couple of high-level points where we need to be careful:

Unions will need to be deftypes, not heaptypes. Otherwise you can't name and share them between multiple uses. That would create a lot of duplication and size overhead in use cases where unions regularly have more than one component. Exports wouldn't work either.
We need to be careful not to make casts too general. Wasm is supposed to be an assembly language, and it is one of its design goals that, wherever possible, every instruction maps closely to an actual machine instruction. Now, with GC we obviously need to raise the abstraction level to a certain extent (to maintain safety and portability). But it would violate the spirit of Wasm if we introduced instructions that compile into arbitrarily long sequences of machine code – as would ultimately be the case if you allowed unions as the target of casts.

The fact that we allow the union with null in cast targets is already borderline in my view, but it can be argued as long as we view null as special. But I'd rather not stretch that further.
A related point is that, in general, Wasm instructions are designed such that they operate uniformly over the types they allow. When an operation's semantics and implementation is significantly different for different types, then we purposely introduce separate instructions. (The "no overloading" mantra.)

Generally, questions about introducing unions open a big can of worms. I think it works better if we keep this issue focussed on the opcode refactoring and discuss more far-reaching changes separately.

By the way, what is the reason that deftypes (positive-valued heap types) are not valid value types on their own, without a ref type constructor?

Those are very different types. That difference matters in the context of nested compound types (see post-MVP doc): a field that is a struct is quite different from a field that is a reference to a struct.

The positive encoding space of value types does not seem to be used.

There is at least one place where type constructor and type index space overlay already, which is block types: they're either a negative value type or a positive type index denoting a function type signature. So overlaying both in a more global way wouldn't be backwards-compatible.

It could save significant coding space (and possibly decoding overhead) if they were. This would also completely unify the encoding of ref types and heap types, making heap types just a subset of the value types.

I'd actually like to allow type definitions "inline" as a shorthand for one-off uses (and ultimately, merge most type sorts). But it is more involved than perhaps obvious at first. For example, it could complicate validation rules if there were "anonymous" compound types. It might also screw up phasing in engines, e.g., wrt type normalisation. Probably not worth investing in it at the moment, since it can always be added later.

Feb 14 '22 11:02 rossberg

Unions will need to be deftypes, not heaptypes. Otherwise you can't name and share them between multiple uses. That would create a lot of duplication and size overhead in use cases where unions regularly have more than one component. Exports wouldn't work either.

Good point. I think this requirement will work fine in practice for i31 unions, as these will typically be united with a few, specific types, so declaring those combinations explicitly will not be a burden.

We need to be careful not to make casts too general. Wasm is supposed to be an assembly language, and it is one of its design goals that, wherever possible, every instruction maps closely to an actual machine instruction. Now, with GC we obviously need to raise the abstraction level to a certain extent (to maintain safety and portability). But it would violate the spirit of Wasm if we introduced instructions that compile into arbitrarily long sequences of machine code – as would ultimately be the case if you allowed unions as the target of casts.

I very much agree. While at first it looks very convenient to be able to express checks for arbitrary unions using single instructions, such combined tests are probably rare enough that it would be acceptable to emit multiple tests in those cases.

The fact that we allow the union with null in cast targets is already borderline in my view, but it can be argued as long as we view null as special. But I'd rather not stretch that further.

Yes. While these are again convenient because it's a common case, it's an exception to the rule that the output type of a cast is the target type of the cast. And it's an inconsistency between ref.cast (which threads null implicitly) and br_on_cast (which requires non-null). One option which would make it somewhat more consistent would be to encode explicitly whether the cast allows null or not, instead of letting it be implicit. This would then be an exception to not allowing unions in tests.

I did an experiment to check how prevalent such nullable-to-nullable casts actually are.

In the barista3 benchmark for dart2wasm, as it is, about 32% of all casts have that form. This is when the code generator only emits a non-nullable cast if it is actually required for the output to be non-nullable for the code to validate.

If I change the cast logic to use a non-nullable cast whenever the value would be tested for null anyway (for instance, right before a field access), then the abundance of nullable-to-nullable casts drops to 24%. If I further change it such that it uses a non-nullable cast whenever it knows that the value is in fact non-null, the abundance drops to 11%.

I imagine that this number would be somewhat higher in a language without declared nullability, such as Java. Do we have numbers on how prevalent casts with implicit null pass-through are in J2Wasm?

Through the same changes, the abundance of nullable-to-non-nullable casts rose from 11% to 32%. Those currently require a separate ref.as_non_null before the cast. If casts were always non-nullable (or could explicitly be specified to be nullable or non-nullable), those cases would become more compact.

A related point is that, in general, Wasm instructions are designed such that they operate uniformly over the types they allow. When an operation's semantics and implementation is significantly different for different types, then we purposely introduce separate instructions. (The "no overloading" mantra.)

This mantra seems to be somewhat at odds with the whole idea of consolidating these instructions. With this mantra in mind, we should at least make sure that the consolidated instructions behave identically apart from the tested type.

There are a few differences between the instructions currently, which would change if we want uniformity of the semantics. Some that I can think of are:

ref.cast, ref.test, br_on_cast and br_on_cast_fail require the input to be dataref or funcref, whereas all the other ones take anyref. Uniformity would dictate that they all take anyref.

As I understand it, this decomposition of casts into first checking that the value has an RTT and then checking the RTT was done exactly to make each instruction do less and to make their behavior less dependent on their input types. Does this actually help engines in practice?

Currently, br_on_null and br_on_non_null throw away the null value. If these are to be instances of generic type test branches, they would preserve the null, like the other such branches do.

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example). It simply becomes

(block $lift
  (br_on_null $lift)
  ;; Perform operation on non-null value
end)

whereas currently, with br_on_null throwing away the null, such an operation is much more involved. If br_on_null would take this form, the overhead of manual nullable casts (should we choose not to have automatic null pass-through) becomes much smaller.

On the other hand, a common use case for br_on_non_null is null fallback, for implementing a null fallback operator (i.e. ?? in Dart or ?: in Kotlin) or for lazy initialization.

(block $lazy
  (br_on_non_null $lazy)
  ;; The null is thrown away here
  ;; Produce value
end)

Here, the most practical behavior is to drop the null, though the extra overhead of a non-dropping br_on_non_null is much smaller than in the br_on_null case - just a single drop.

Generally, questions about introducing unions open a big can of worms. I think it works better if we keep this issue focussed on the opcode refactoring and discuss more far-reaching changes separately.

OK, let's stick here to the parts of the union discussion that affect the opcode consolidation.

Feb 25 '22 08:02 askeksa-google

With this mantra in mind, we should at least make sure that the consolidated instructions behave identically apart from the tested type.

Absolutely. Currently, there are three different flavours: (1) null cast, (2) abstract cast, (3) concrete cast involving RTT. They differ in their semantics and implementation as well as in the details of their typing rules. So we probably do not want to over-unify. We could unify (1) with (2), but that may have drawbacks (also, ref.is_null already pre-exists).

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example).

Interesting, I haven't seen this pattern in my compilers. Where does it come up?

I did an experiment to check how prevalent such nullable-to-nullable casts actually are. In the barista3 benchmark for dart2wasm, as it is, about 32% of all casts have that form. This is when the code generator only emits a non-nullable cast if it is actually required for the output to be non-nullable for the code to validate. If I change the cast logic to use a non-nullable cast whenever the value would be tested for null anyway (for instance, right before a field access), then the abundance of nullable-to-nullable casts drops to 24%. If I further change it such that it uses a non-nullable cast whenever it knows that the value is in fact non-null, the abundance drops to 11%. I imagine that this number would be somewhat higher in a language without declared nullability, such as Java. Do we have numbers on how prevalent casts with implicit null pass-through are in J2Wasm?

Good question! Originally, casts did not allow null. I changed that on @titzer's request, who wanted it for Java. He may have numbers.

Through the same changes, the abundance of nullable-to-non-nullable casts rose from 11% to 32%. Those currently require a separate ref.as_non_null before the cast. If casts were always non-nullable (or could explicitly be specified to be nullable or non-nullable), those cases would become more compact.

Indeed, I have suspected something like this.

I'd be happy to revisit the special handling of null. As you say, it's a bit of an irregularity, and the benefit was never properly evaluated.

Mar 01 '22 14:03 rossberg

Originally, casts did not allow null. I changed that

I believe it was the Kotlin team (I might misremember though) who in the past expressed that both variants are useful. I'd support having both, so that modules can use whichever is most efficient for the respective situation.

As I understand it, this decomposition of casts into first checking that the value has an RTT and then checking the RTT was done exactly to make each instruction do less and to make their behavior less dependent on their input types. Does this actually help engines in practice?

It avoids an i31 check, which any cast from anyref needs to include. Aside from that, V8 doesn't care. (I can't speak for other engines, of course.)

Mar 01 '22 14:03 jakobkummerow

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example).

Interesting, I haven't seen this pattern in my compilers. Where does it come up?

It comes up with null-aware operations, such as x?.foo(y) in Dart, which means "evaluate x, then call x.foo(y) if x was non-null, otherwise evaluate to null". Though in typical use, often the null can be optimized away because it is either not used, used directly in a null fallback ?? or used directly in a comparison == where the result with null is known.

It avoids an i31 check, which any cast from anyref needs to include. Aside from that, V8 doesn't care. (I can't speak for other engines, of course.)

To elaborate a bit on this, I see two cases:

The static input type is known not to be i31. Is it feasible here (even for baseline compilers) to omit the i31 check?
The static input type is possibly i31. Here the code currently must have a ref.as_data or ref.as_func. You have mentioned earlier that ref.as_data is implemented as "is struct or is array". How does an implicit i31 check compare with such an explicit check?

Mar 01 '22 18:03 askeksa-google

The static input type is known not to be i31. Is it feasible here (even for baseline compilers) to omit the i31 check?

Generally yes. We currently don't have that implemented for i31 checks, but we're already doing it for null-checks, and the principle would be the same. It would be a tiny bit more effort, but I think we could live with that. (There might also be limitations in edge cases, where "the module said this couldn't be i31" might handle control flow merges better than "the simplistic compiler was able to infer that this couldn't be i31", but I'd expect cases where this matters to be rare.)

The static input type is possibly i31. Here the code currently must have a ref.as_data or ref.as_func. You have mentioned earlier that ref.as_data is implemented as "is struct or is array". How does an implicit i31 check compare with such an explicit check?

Good point: if we changed ref.cast $t to consume an anyref and implicitly perform an i31 check, then the code sequence for !is_i31 && is_$t would be faster than today's br_on_non_data + ref.cast $t, which needs to emit !is_31 && is_struct_or_array && is_$t. Whether this matters in practice, of course, depends on how often anyref is used, as opposed to typing generic things as (ref struct) or (ref $java.lang.Object) etc. For module producers that want to make use of i31, avoiding the is_struct_or_array checks might be quite impactful.

In summary, I guess I have to revise my earlier statement: ~~It avoids an i31 check, which any cast from anyref needs to include. Aside from that,~~ V8 doesn't care.

(And thanks for taking the time to make me see this -- in retrospect, I wonder why it wasn't obvious to me before...)

Mar 02 '22 14:03 jakobkummerow

@jakobkummerow, is the point that is_data actually is more precise (and expensive) than what would be needed to ensure you can extract an RTT from an anyref that is not an i31? E.g., because is_data || is_func would suffice for that and require fewer instructions in V8?

If so, is this only a problem when using the full generality of anyref? For example, with eqref, I would assume that is_data == not is_i31, so if you already checked for i31, then is_data could be a nop.

Mar 07 '22 13:03 rossberg

@rossberg Yes, that's what I was trying to say:

When the static type is the fully general anyref, then is_data is more precise and more expensive than !is_i31.
In several scenarios where the static type is more precise than anyref, there will be no difference at all.
I didn't say there was a problem, on the contrary. I said that changing ref.cast to take an anyref instead of a dataref, which might at first glance look like it might make things slower, will not make things slower, so I have no objections to this suggested change.

Mar 07 '22 15:03 jakobkummerow

@jakobkummerow, got it, thanks for the clarification.

Mar 07 '22 16:03 rossberg

A concrete proposal:

As an initial observation, the null-related instructions are different from the rest in several ways:

Expressing null checks via the generic instructions requires the addition of a null type.
ref.is_null is already finalized.
ref.as null is useless.
In the absence of union types, ref.as_not is not useful for other types than null.
br_on_null and br_on_non_null eat the null value.

So my suggestion would be to keep all the null-related instructions as they are. For all the others, we'll have (modulo naming):

ref.is ht: anyref -> i32
ref.as ht: anyref -> ref ht
ref.as_or_null ht: anyref -> ref null ht
br_on ht: t1 -> t1 (ref ht on branch), t1 <: anyref
br_on_not ht: t1 -> ref ht (t1 on branch), t1 <: anyref

Having a separate nullable cast instruction gets rid of the polymorphism of ref.cast, i.e. the output type of the instructions is solely determined by the heap type immediate. It also makes non-nullable casts of nullable inputs more compact.

This decouples the proposal from both the issue of union types (or i31 as a type dimension) and that of null/empty types.

Mar 25 '22 14:03 askeksa-google

This sounds reasonable to me. @rossberg, @jakobkummerow, wdyt? I believe this is also orthogonal to what we do with RTTs, which would replace the ht immediate in the above instructions if they were to be used.

Mar 29 '22 18:03 tlively

No objections.

Checks/casts for abstract types like ref.is data would grow by one byte compared to ref.is_data, but presumably those are fairly rare compared to checks for concrete, module-defined types, so that probably doesn't matter much. (In fact, maybe this makes data so unused that we could just get rid of it as a follow-up, but that's a separate discussion.)

I wonder whether ref.is and the br_on_* instructions would also need a *_or_null variant, but I'm not in a position to have an opinion on that: I'm generally not swayed by "symmetry" or other aesthetics-related arguments, so I think this would come down to the needs of module producers/processors. I suppose for br_on_* there is at least no urgency because br_on_{non_,}null can simply be used, and we can always add a combined instruction as a minor optimization post-MVP if we have data to indicate that it would make a significant difference.

As a logistical question: do we want to introduce new opcodes in the binary format for these, to allow a gradual transition? (I expect that we'll do an opcode cleanup just before finalizing the GC proposal, so this would be temporary at most until then.)

Mar 29 '22 19:03 jakobkummerow

Using new opcodes and doing an opcode cleanup at the end sounds good. The relevant lessons I learned from doing the same (multiple times!) in the SIMD proposal are 1) to really push to opcode cleanup to the very end, not just when you're 80% sure you're done adding new things and 2) to communicate clearly and early with partners that the opcode cleanup will be a breaking change.

Mar 29 '22 19:03 tlively

To be clear, if we are adding these as new opcodes, we are removing/disabling the old opcodes?

Mar 29 '22 23:03 titzer

Yes, in this repo we certainly should remove the old versions. V8 and Binaryen will probably continue supporting the old instructions for some time to avoid unnecessarily breaking builds as soon as the new instructions land.

Mar 30 '22 01:03 tlively

sgtm, as long as this repo continues to represent the consensus on the complete set of bytecodes and their semantics, then that last opcode cleanup will be fairly painless.

Mar 30 '22 02:03 titzer

@askeksa-google, thanks, that's roughly what I had in mind, except that I'd restrict the heaptype to be abstract and keep the instructions involving concrete types and RTTs separate like the null-related ones, since they are very different operationally (and type-wise).

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

Apr 04 '22 12:04 rossberg

@rossberg, can you be more specific about what you have in mind so we can better compare it to @askeksa-google's proposal?

Apr 04 '22 16:04 tlively

@askeksa-google, thanks, that's roughly what I had in mind, except that I'd restrict the heaptype to be abstract and keep the instructions involving concrete types and RTTs separate like the null-related ones, since they are very different operationally (and type-wise).

Personally, I find the unification of abstract and concrete tests to be one of the more appealing aspects of the refactoring.

Also, abolishing the data/func requirement for concrete type casts and having explicitly nullable casts both have practical impacts on code size.

What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Apr 04 '22 17:04 askeksa-google

If the abstract versions of casts don't pop an RTT value off the stack, and the concrete ones do, I would prefer they be separate instructions.

Apr 04 '22 18:04 titzer

My proposal above is just for the statically typed instructions. We can add separate versions for RTTs independently.

Apr 05 '22 09:04 askeksa-google

@tlively, as hinted at in the OP, I had in mind s.th more moderate, like (picking up @askeksa-google's idea for two versions of ref.as):

ref.is <baseheaptype> : [anyref] -> [i32]
ref.as <baseheaptype> : [(ref null1? any)] -> [(ref null2? <baseheaptype>)]
- iff null1? = null2?
br_on <baseheaptype> $l : [t0* t] -> [t0* t]
- iff $l : [t0* t]
- and t <: anyref and (ref <baseheaptype>) <: t'
br_on_not <baseheaptype> $l : [t* t] -> [t* (ref <baseheaptype>)]
- iff $l : [t* t']
- and t <: anyref and t <: t'

where

baseheaptype ::= func_or_data | func | data | struct | array | i32

Null-related instructions and concrete casts are left alone.

So as mentioned, my intent here is mostly a syntactic simplification, with easier extensibility. The only minor semantic change is that I introduced func_or_data as a new abstract heap type that's the common super type of func and data (probably in need of a better name). That potentially allows for slightly cheaper abstract casts when they're just meant to feed into a consecutive concrete cast (though this would become moot if we decided to divorce func from anyref).

@askeksa-google:

Personally, I find the unification of abstract and concrete tests to be one of the more appealing aspects of the refactoring. Also, abolishing the data/func requirement for concrete type casts and having explicitly nullable casts both have practical impacts on code size. What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

In order to check an anyref for a concrete type, the engine first has to check that it actually is a type that has an RTT. These are multiple steps, in pretty much any conceivable implementation, and the goal of the instruction design is to represent them as such in order to preserve the spirit of a low-level machine language as close as possible. It is seductive to argue with code size here, but this would be quite premature, and a significant departure from Wasm's design so far, where we have decidedly avoided the introduction of ad-hoc "macro" instructions, unless there is a clear case that engines could generate significantly better code for them (such as the bulk instructions).

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

Apr 11 '22 11:04 rossberg

@rossberg:

The only minor semantic change is that I introduced func_or_data as a new abstract heap type that's the common super type of func and data (probably in need of a better name). That potentially allows for slightly cheaper abstract casts when they're just meant to feed into a consecutive concrete cast (though this would become moot if we decided to divorce func from anyref).

How does func_or_data fit into our current hierarchy? It seems incompatible with eq, since neither is a subtype of the other, and both are supertypes of data.

This reminds me of another gripe I have with the current cast instructions: whereas the input requirements of most other instructions can be expressed as a single type (implicitly allowing subtypes), the cast instructions permit one of two disjoint types. While the special case for this in the validator is not a particularly complicated one, it does seem like a wart.

What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

Well, that's what the proposal changes. Both kinds of instruction check for a subtype relationship, and how that check differs between abstract and concrete types is up to the implementation. Boundaries between different kinds of checks need not be exactly at the boundary between the abstract and concrete types.

In order to check an anyref for a concrete type, the engine first has to check that it actually is a type that has an RTT. These are multiple steps, in pretty much any conceivable implementation, and the goal of the instruction design is to represent them as such in order to preserve the spirit of a low-level machine language as close as possible. It is seductive to argue with code size here, but this would be quite premature, and a significant departure from Wasm's design so far, where we have decidedly avoided the introduction of ad-hoc "macro" instructions, unless there is a clear case that engines could generate significantly better code for them (such as the bulk instructions).

I agree that operations should generally be decomposed into basic steps, but only when those steps are unambiguously aligned with the overall operation.

The current split seems to assume a particular implementation strategy, and as we have heard in https://github.com/WebAssembly/gc/issues/274#issuecomment-1060840681 this assumption is at odds with (at least) V8, since as_data is a more expensive operation than the extra operation that it would need to perform when casting directly from anyref.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

A practical advantage is that a cast from a nullable to a non-nullable type (which is a frequent occurrence, as reported in https://github.com/WebAssembly/gc/issues/274#issuecomment-1050643224) is only one instruction instead of currently two.

Again, having these as separate operations seems to assume a particular implementation strategy. Depending on how null references are implemented, a cast from a nullable input could be just as efficient as a cast from a non-nullable one.

Apr 11 '22 14:04 askeksa-google

@askeksa-google:

How does func_or_data fit into our current hierarchy? It seems incompatible with eq, since neither is a subtype of the other, and both are supertypes of data.

Subtyping is a partial order, and func_or_data simply is neither a sub nor a super type of eq. So there's no incompatibility.

             any
            /   \
  func_or_data   eq
         /   \  /  \
     func    data   i32

You can easily construct analogous hierarchies with struct or function types already (as long as multiple super types are allowed anyway). And certainly with union types.

This reminds me of another gripe I have with the current cast instructions: whereas the input requirements of most other instructions can be expressed as a single type (implicitly allowing subtypes), the cast instructions permit one of two disjoint types. While the special case for this in the validator is not a particularly complicated one, it does seem like a wart.

Well, yes, that's what func_or_data is supposed to replace. ;)

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

Well, that's what the proposal changes. Both kinds of instruction check for a subtype relationship, and how that check differs between abstract and concrete types is up to the implementation. Boundaries between different kinds of checks need not be exactly at the boundary between the abstract and concrete types.

In fact, the proposal was more like this originally, but long ago changed to what you see now in order to simplify the individual instructions, which seemed overly complex before.

I agree that operations should generally be decomposed into basic steps, but only when those steps are unambiguously aligned with the overall operation.

Yeah, but it's not totally clear what "aligning" would mean in this context. It is a qualitative difference whether a test involves inspecting user-defined type hierarchies or not (and that is regardless of whether RTTs are explicit). In practice, there will always be some difference between engines. But while slight differences may exist, I think it is pretty fair to assume that a close enough distinction will exist in all realistic implementation.

The current split seems to assume a particular implementation strategy, and as we have heard in #274 (comment) this assumption is at odds with (at least) V8, since as_data is a more expensive operation than the extra operation that it would need to perform when casting directly from anyref.

Yes, the introduction of func_or_data should remove that discontinuity as well. As said above, if an abstract cast feeds into a concrete cast, then func_or_data is what you'd use optimally.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

A practical advantage is that a cast from a nullable to a non-nullable type (which is a frequent occurrence, as reported in #274 (comment)) is only one instruction instead of currently two. Again, having these as separate operations seems to assume a particular implementation strategy. Depending on how null references are implemented, a cast from a nullable input could be just as efficient as a cast from a non-nullable one.

Okay, I think I can be convinced that having two such instructions is worthwhile. Somewhat unclear whether the same argument wouldn't apply to br_on, though, once that's used in anger. The use cases might be sufficiently different and presumably involve non-null operands or an earlier null check in typical scenarios.

Apr 11 '22 16:04 rossberg

I am strongly concerned about the prospect of making the type hierarchy a DAG rather than a tree and am also concerned about introducing a new basic heap type solely to abstract over func and data and make them usable with the same instructions. Adding a new type (actually two, since your proposal also adds struct) is not an acceptable cost for the benefit of deduplicating a few instructions, IMO.

Apr 11 '22 16:04 tlively

gc gc copied to clipboard

Refactor cast opcodes?

The setup

Testing instructions

Sharpening

eq

Definite null

Empty

Interpreter performance

RTT

gc
gc copied to clipboard

`eq`

Definite `null`