gc icon indicating copy to clipboard operation
gc copied to clipboard

Pushing i31ref to post-MVP

Open eqrion opened this issue 3 years ago • 20 comments

Are any production languages going to require i31ref in the MVP? If not, I think we should consider pushing it to a separate proposal after the MVP.

SpiderMonkey doesn't have any concerns regarding the feature's implementability. However, there were extensive discussions surrounding the suitability of i31ref for language ports, and I'd like to see real experience with the feature before committing to a tagging scheme in the MVP.

eqrion avatar Aug 11 '22 17:08 eqrion

i31ref will be heavily used by OCaml.

titzer avatar Aug 11 '22 19:08 titzer

i31ref will be heavily used by OCaml.

Sure, no doubt there are some languages that will require some sort of feature here. I believe that's clear.

Does OCaml have a relatively-complete port to use Wasm-GC at this time? I'm asking this to be sure that i31ref has been fully validated in a real implementation, and that there's intent for it to be seriously used on around the timeline of a GC-MVP.

eqrion avatar Aug 11 '22 19:08 eqrion

As I see it, there are several mismatches between the common use cases for i31 and the way it is currently specified.

Some of the most common use cases that I have seen described throughout the discussions are:

  1. As the representation of integers in a uniform representation (the OCaml use case, AIUI)
  2. As some of the variants of an ADT, with other variants represented by structs.
  3. As the representation of small integers in a language with objectified integers, with larger values boxed into structs (the Dart use case).

Common to all of these use cases is that the i31 occurs in a union with other values. This is essentially what i31 is good for. But the current type system does not reflect this capability, leading to extra casts.

As described in that post, even with a union type, boxing and unboxing operations in the Dart use case are rather clunky. As these operations are very common, it's important that they are small and well optimizable by the engine. A way to achieve that would be to have fallible conversions from i32 and i64. These would be a set of 8 branch instructions:

br_on_i32_to_i31_s
br_on_i32_to_i31_u
br_on_i64_to_i31_s
br_on_i64_to_i31_u
br_on_not_i32_to_i31_s
br_on_not_i32_to_i31_u
br_on_not_i64_to_i31_s
br_on_not_i64_to_i31_u

If the value of the integer fits inside the range of i31, the instruction branches (or not) and leaves an i31 on the stack. Otherwise the integer is left on the stack.

Dart int boxing would then simply be (with the uncommon case delegated to a function):

(block $b (param i64) (result (ref i31 $BoxedInt))
  (br_on_i64_to_i31_s $b)
  (call $box_to_struct)
end)

A similar situation (though not as dramatic) occurs for unboxing. The combination of br_on_i31 and i31.get_s is largely redundant, since the only sensible thing to do after checking that a value is an i31 is to convert it. Ideally, we would have instructions mirroring the fallible conversions:

br_on_i31_to_i32_s
br_on_i31_to_i32_u
br_on_i31_to_i64_s
br_on_i31_to_i64_u
br_on_not_i31_to_i32_s
br_on_not_i31_to_i32_u
br_on_not_i31_to_i64_s
br_on_not_i31_to_i64_u

which would branch (or not) on i31 (doing the conversion) and otherwise leave the input value on the stack. These would shave off 5 instructions (block, br, i31.get_s, i64.extend_i32_s, end) for every unboxing.

This may look like another case of "please add the macro instructions that I need for my language", but I really do think it's more general than that, and that it caters to all common uses of i31. Also, the way i31 is usually implemented, it's natural to combine testing and encoding/decoding, and there are opportunities for sharing code between them. I think the instruction set should reflect that property.

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

For this reason, I'm opposed to including i31 in its current form in the MVP.

askeksa-google avatar Sep 19 '22 16:09 askeksa-google

As an aside, we could cut down on the signed/unsigned duplication by specifying that i31 values are unsigned and offering i32.extend31_s and i64.extend31_s, as we did for 8 and 16-bit integers after MVP.

titzer avatar Sep 19 '22 17:09 titzer

As an aside, we could cut down on the signed/unsigned duplication by specifying that i31 values are unsigned and offering i32.extend31_s and i64.extend31_s, as we did for 8 and 16-bit integers after MVP.

That would reintroduce the need for the outer block, which the proposed instructions avoid, thus negating the 5-instruction saving (well, 4 of them). The test, conversion and extension need to all happen in one step (on the branch edge) to get that benefit.

askeksa-google avatar Sep 19 '22 19:09 askeksa-google

I was reasoning that br_on_not_i31_to_i32_s/u would leave an i32 on the stack, which, if simplified to only be unsigned, would need only a single extension. Likewise with the other branches; you'd just need an extension at the target label.

titzer avatar Sep 20 '22 04:09 titzer

@askeksa-google:

Some of the most common use cases that I have seen described throughout the discussions are:

  1. As the representation of integers in a uniform representation (the OCaml use case, AIUI)
  2. As some of the variants of an ADT, with other variants represented by structs.
  3. As the representation of small integers in a language with objectified integers, with larger values boxed into structs (the Dart use case).

As I keep pointing out, the most common use case actually is:

  1. As the representation of small scalars in contexts requiring a reference.

Such scalars include bools, int8, int16, chars, enums, etc. Furthermore, if the language is typed, then none of these use cases involves a union, except in polymorphic contexts, which form one huge union.

Languages that want it for that purpose include:

  1. languages with a uniform representation by default (most polymorphic languages),
  2. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

As I showed in my presentations, Waml and Wob employ i31 extensively for these purposes.

Common to all of these use cases is that the i31 occurs in a union with other values. This is essentially what i31 is good for. But the current type system does not reflect this capability, leading to extra casts.

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

This may look like another case of "please add the macro instructions that I need for my language"

Indeed. :)

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

I don't understand how you arrive at that conclusion. Any of the instructions you mention could be a Post-MVP extension. So the current design seems perfectly in spirit of an MVP.

rossberg avatar Sep 20 '22 07:09 rossberg

I was reasoning that br_on_not_i31_to_i32_s/u would leave an i32 on the stack, which, if simplified to only be unsigned, would need only a single extension. Likewise with the other branches; you'd just need an extension at the target label.

The issue is that the target label is a join between the value converted from i31 and another value, which should not be extended (e.g. the value loaded from the box struct). If the i31 path has extra instructions after the target label, then the non-i31 path needs to jump around those extra instructions, necessitating the extra block.

For illustration, this is what unboxing of a Dart int would look like with the full-fledged instructions:

(block $b (param (ref i31 $BoxedInt)) (result i64)
  (br_on_i31_to_i64_s $b)
  (struct.get $BoxedInt 1)
end)

askeksa-google avatar Sep 20 '22 08:09 askeksa-google

If i31 is primarily designed for use in languages that are not helping validate the MVP, then making i31 a post-MVP extension makes a lot of sense. Alternatively, we should design i31 so that it is useful to languages with active plans to target WasmGC and make any additional functionality useful for other languages a post-MVP extension.

tlively avatar Sep 20 '22 14:09 tlively

  1. As the representation of small scalars in contexts requiring a reference.

This looks to me like a general description of the purpose of i31ref which covers all of the use cases I mentioned. What characterizes a use case is why the context requires a reference.

  1. languages with a uniform representation by default (most polymorphic languages),

Right, so more precisely, you are talking about polymorphic languages without the ability to inspect the runtime type of values, i.e. either a value is fully opaque or its precise type is known statically. Is this what you mean?

  1. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

Yes. Dart is in the same camp.

For both of these categories, indeed a union type is only beneficial if all of the non-i31 types in the union have a common supertype that has capabilities on its own (such that code paths that only need that capability can replace a cast by an i31 check). This is the case for Dart, for instance, and I expect it would also be for Java and C#, where anything that is not an i31 would be of a common Object struct that has several capabilities on its own (inspect the type, read the identity hash, call methods).

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

I am curious about the practical reasons why most such use cases can't just use i32. I can see some potential reasons they couldn't (function subtyping, covariantly typed fields), but these seem to only apply in special cases.

For comparison, dart2wasm has a uniform representation which includes boxed bool, int and double, but whenever these types occur in a context that is not forced to use the uniform representation, they use the unboxed types i32, i64 and f64, respectively.

What does such situations look like in Waml and Wob?

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

I don't understand how you arrive at that conclusion. Any of the instructions you mention could be a Post-MVP extension. So the current design seems perfectly in spirit of an MVP.

What I am primarily worried about is that if we specify i31 as currently and later want to add unions, the type rules for instructions might not generalize to the rules that we'd want at that point. Though if it's only a matter of instructions, I guess in the worst case we'll need to add new instructions with the desired behavior.

askeksa-google avatar Sep 30 '22 15:09 askeksa-google

I've put this on the agenda for our subgroup meeting tomorrow.

tlively avatar Oct 03 '22 17:10 tlively

I will be traveling tomorrow and so I won't be able to attend. I'll try to state my opinion here then.

In the past, we have standardized and shipped some proposals which had subsets that ended up not being used in practice. Specifically, with reference-types I've seen data that indicates externref is not seeing any real use in the browser. That can definitely change in the feature, but it has been years since we shipped that.

My concern here is to avoid a similar situation with i31ref, due to the complexity cost of introducing tagging in our engine and risk of incompatible changes being requested when a language does attempt to use some sort of tagging.

I don't have a strong opinion for or against the current design of i31ref. It seems like a reasonable feature, and hearing of a serious language port using it effectively is really the only thing I'm looking for.

This is sort of a timing issue. If we really want this proposal to move forward in standardization quicker than we can get this data, then I believe it should be in a separate proposal (which I hope could move quicker than this one has). If we can get this data quick enough, then I support leaving it in this proposal.

eqrion avatar Oct 03 '22 21:10 eqrion

The V8 team holds the same opinion, so it will be well represented at the meeting tomorrow 👍

tlively avatar Oct 03 '22 21:10 tlively

  1. As the representation of small scalars in contexts requiring a reference.

This looks to me like a general description of the purpose of i31ref which covers all of the use cases I mentioned. What characterizes a use case is why the context requires a reference.

Yes, but the point is that scalars are a broader use case than just source language integers or enums, which were the only cases you listed. As for the why, that's what the two (sub)points below meant to expand on.

  1. languages with a uniform representation by default (most polymorphic languages),

Right, so more precisely, you are talking about polymorphic languages without the ability to inspect the runtime type of values, i.e. either a value is fully opaque or its precise type is known statically. Is this what you mean?

Yes, which is common for typed languages. Even those with casts typically limit casts to object-like values, while others are opaque (not even all OO languages make everything an object).

  1. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

Yes. Dart is in the same camp.

For both of these categories, indeed a union type is only beneficial if all of the non-i31 types in the union have a common supertype that has capabilities on its own (such that code paths that only need that capability can replace a cast by an i31 check).

That may be so in Dart or similar everything-is-a-sort-of-object languages, but is not the case outside that camp. E.g., Waml is not like that, nor won't be OCaml, Haskell, or similar languages with a strong need for uniform representation.

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

I am curious about the practical reasons why most such use cases can't just use i32. I can see some potential reasons they couldn't (function subtyping, covariantly typed fields), but these seem to only apply in special cases.

Because polymorphism is very expressive in these languages. You can always type-abstract over everything visible after the fact, including over deeply nested type components. For example, I can pass a function of type int array -> int to another function with type ('a array -> 'a) -> t, so the compiler cannot represent int array as array i32. Similarly with everything else. Unless the compiler can see all use sites, it can basically never assume that a first-class value does never flow into a context where it is viewed polymorphically. Static type specialisation is not an option either, because polymorphism is first-class, i.e., there may be polymorphic call sites that don't statically know the definition of the called function (nor the type it's called at).

For comparison, dart2wasm has a uniform representation which includes boxed bool, int and double, but whenever these types occur in a context that is not forced to use the uniform representation, they use the unboxed types i32, i64 and f64, respectively.

In languages like the above, you can unbox the contents of second-class variables, but that's basically it. And it is less beneficial than you might think, because you often need to pass the value around, in which case repeated unboxing/boxing is more expensive than keeping it boxed in the first place – a value is typically passed and stored more often than actually inspected.

What does such situations look like in Waml and Wob?

For Waml, I made unboxing configurable separately for locals, globals, temps, closure environments, and pattern scrutinees. But everything is always boxed as part of other data structures, even when the type is fully known. That is necessary, see above.

Wob has explicit box types, somewhat similar to Java's Integer class and friends, except that they aren't objects. So the programmer is in control of boxing and unboxing. But only boxed types can instantiate generics.

What I am primarily worried about is that if we specify i31 as currently and later want to add unions, the type rules for instructions might not generalize to the rules that we'd want at that point. Though if it's only a matter of instructions, I guess in the worst case we'll need to add new instructions with the desired behavior.

I am positive that unions are perfectly compatible with the current design. In particular, i31 poses no more of a problem than any other type. That said, there are probably good reasons to have unions work with their own set of more explicit instructions anyway, esp to avoid the need for naive subtyping and the quadratic cost pitfall. (I have some thoughts on making unions ordered, but that's off-topic here.)

rossberg avatar Oct 04 '22 08:10 rossberg

While I agree with what @rossberg wrote above w.r.t. to implementing polymorphism (though Virgil does whole-program static specialization and does not have polymorphic recursion or first-class polymorphism), the stronger argument is to represent tagged unions (aka sum types, algebraic data types, or variants). Another use case is to implement dynamic languages (like JavaScript!) on top of Wasm, using i31 as the small integer type.

I think all three of these use cases are really important. Having a form of tagged reference in Wasm is a major value-add over other statically-typed bytecode formats such as DEX, the JVM, and CIL.

titzer avatar Oct 04 '22 13:10 titzer

Since @rossberg has repeatedly claimed that polymorphic languages need i31ref and explicitly mentioned Haskell, I thought the group might appreciate knowing for today's discussion that the prevailing compiler for Haskell—GHC—does not use anything like i31ref. This documentation indicates that GHC uses 32/64-bit integers for its "machine-sized" integer type Int (and these are indeed boxed). On the other hand, this documentation indicates that GHC does use pointer tagging. In particular, for values of algebraic data types the low bits indicate whether the value has been evaluated (since Haskell is lazy) and, if so, which case the value belongs to. This lets many case statements avoid having a memory load block a branch. Other Haskell compilers use 30 bits for Int, again because they need to reserve a bit for pointer tagging to distinguish evaluated and unevaluated lazy values.

So, contrary to the claim, i31ref is not widely accepted among polymorphic languages and in fact is incompatible with the optimizations some polymorphic languages do actually heavily rely upon.

the stronger argument is to represent tagged unions (aka sum types, algebraic data types, or variants)

Many implementations of languages with algebraic data types use pointer tagging for this instead.

Another use case is to implement dynamic languages (like JavaScript!) on top of Wasm, using i31 as the small integer type.

Many implementations of dynamic languages use pointer tagging instead. And many implementations of dynamic languages with more emphasis on floating-point computation use NaN-boxing instead.

RossTate avatar Oct 04 '22 14:10 RossTate

I implemented basic i31 support in dart2wasm to get a data point for the discussion. It represents small int values as i31ref and does boxing and unboxing by calling functions that are implemented basically like I presented in the type dimension discussion. It also needs to check for i31 in many places, such as on method call receivers and type tests/casts. It bumps the type for the uniform representation to eqref.

For most benchmarks I did not see a measurable difference. For barista3 (big, method call-heavy benchmark) I saw a roughly 10% slowdown. For Richards, which is reasonably List<int> heavy, I saw a 10% speedup.

For barista3, the module size grew by 2% (combination of larger code and smaller constants). Other benchmarks varied from slightly smaller to about 1% bigger.

So it appears i31 has some potential for this use case, even in its current form, though its overall benefit is not clear-cut.

Update: The quoted numbers are with the module optimized by Binaryen, which manages to eliminate the i31 check from 90% of all method calls.

askeksa-google avatar Oct 04 '22 15:10 askeksa-google

@RossTate, it's correct that GHC does something special, mostly because of laziness, as described in their ICFP'07 paper. The authors also point out that they are the odd one out:

Many programming language implementations (SML/NJ, Ocaml, Squeak) use tag bits to differentiate pointers from non-pointers.

rossberg avatar Oct 04 '22 15:10 rossberg

Yes, those three languages use i31, and I never indicated otherwise. I just indicated that a proper discussion of this topic should include considering that many languages do not use i31 and that many languages use features incompatible with i31.

RossTate avatar Oct 04 '22 15:10 RossTate

It turns out that @chambart and others are currently working on bringing OCaml to WasmGC and i31 as currently specified is working well for them. Since that resolves the core concern here, I think we can close the issue and keep i31 as it is. @eqrion, do you agree?

We're also pretty sure that we will be able to extend i31 to work better for small integer optimizations either via new instructions for boxing and unboxing integers or via a more general union mechanism post-MVP.

tlively avatar Oct 05 '22 15:10 tlively

Reflecting on this decision, it has some noteworthy implications:

(1) i31ref support isn't free. It has a small but non-zero cost for every module¹, even if that module doesn't use i31ref at all. It doesn't follow the pay-as-you-go principle, which we usually consider very desirable.

¹ Qualifications:

  • In an engine that uses i31ref-like tagging as part of its non-Wasm externref/anyref implementation, i31ref causes a cost for the extern.internalize instruction. To be fair, by introducing the 3-pronged type system we have at least managed to contain the cost to this one instruction on such engines. (Checking for i31ref tags on downcasts from anyref is unavoidable in such an engine, regardless of the existence of i31ref.)
  • In an engine that doesn't use i31ref-like tagging for anything else, i31ref additionally causes a cost on all downcasts from anyref to non-i31ref types, if i31ref is implemented in the envisioned way (see point (2) below). In theory, this cost could be avoided for completely self-contained modules (no use of externref or anyref on their public interface, and no use of i31ref inside); I don't know if any implementation would care to special-case that situation.

(2) There's an obvious way how i31ref is supposed to be implemented, but it seems difficult (and maybe even undesirable?) to write formal spec text (and tests) to actually enforce this. Just how in a hypothetical spec without i31ref, module producers would have to manually box their scalars into single-field structs, an engine could decide to implement i31ref that way under the hood. That would leave as benefit of having i31ref that it makes the lives of certain module producers easier (which is not usually very high on our priority list); but (assuming such an engine is sufficiently popular) it would disappoint any "i31ref doesn't allocate" performance expectations, and hence nullify i31ref's other purpose, which is to improve efficiency of certain patterns. (Of course, this concern doesn't carry too much weight because similar things could be said about almost every feature: features typically offer room for optimizations, rather than prescribing certain optimizations by spec.)

(3) To summarize the decision process, it was essentially: "We should postpone this feature because nobody uses it." -- "OCaml does use it." -- "OK, then we'll keep it in the spec." In other words, we have just set a precedent of adding a feature that has a cost even for those who don't use it (see point (1)) just for the benefit of a single language (plus, of course, nebulous predictions about various other languages that might conceivably eventually also use it).

To be clear, I'm not arguing that we should revisit the decision. I'm aware that standardizing a platform is always a game of ugly compromises, and I'm just reflecting on this particular compromise.

jakobkummerow avatar Oct 10 '22 12:10 jakobkummerow

It is generally expected that the use of less precise types is less efficient than more precise types. Hence, a module that doesn't need its generality should certainly avoid anyref, regardless of i31ref (allowing to internalize externref can have a similar effect). That is, the proposal allows to avoid extra cost, though of course it does not prevent producers from generating suboptimal code instead – but that's universally true. Likewise with the possibility of crappy implementations in engines – by that line of argument, almost every new feature we had since 1.0 could be questioned.

As @titzer pointed out, the ability to unbox values is one of the unique selling points of Wasm GC. I predict that we will see plenty of use, because it is the counterpart to a technique widely used in native code (even if we cannot make it quite as flexible as native code).

rossberg avatar Oct 11 '22 06:10 rossberg

It turns out that @chambart and others are currently working on bringing OCaml to WasmGC and i31 as currently specified is working well for them. Since that resolves the core concern here, I think we can close the issue and keep i31 as it is. @eqrion, do you agree?

Apologies for the long delay and slowing down this discussion, this dropped off my radar. @tlively (or @chambart) is there any additional public information about this porting effort? Reading through the meeting notes, I didn't see anything.

Specifically, I'm curious about this line from the meeting notes: PC (Chat): Leo and I are working on this (OCaml) already. The current version of i31 works well, but don’t have performance numbers. Getting numbers by comparing to boxing all integers would be too much work..

I do think performance numbers are relevant here. We've dropped several advanced optimization features from this proposal because the expected improvement was likely not required to port languages initially. If OCaml could get away with boxing all integers, then deferring it from this proposal would be similar to how we've treated other features such as nested data structures and interior references. Both of those optimizations would also reduce allocations and remove indirections.

As @titzer pointed out, the ability to unbox values is one of the unique selling points of Wasm GC. I predict that we will see plenty of use, because it is the counterpart to a technique widely used in native code (even if we cannot make it quite as flexible as native code).

I agree that this would be a unique selling point for Wasm GC. I also understand the sentiment behind wanting to avoid the chicken and egg problem between language ports and wasm runtimes. However, we'll run into this with all post gc-mvp features, and the solution is for runtimes to keep implementing these features before full standardization and making them available for language ports to experiment with. Not standardizing and shipping first so that language ports can then experiment with them after.

eqrion avatar Oct 11 '22 20:10 eqrion

However, we'll run into this with all post gc-mvp features, and the solution is for runtimes to keep implementing these features before full standardization and making them available for language ports to experiment with.

The issue is that we're far from having a well-functioning feedback loop with such prospective runtimes. The message from many language implementers has been that they're waiting for features to stabilize; partly for their own sanity in waiting out the inevitable churn. That might improve generally with time as we attract more languages with the features that we do ship, but I feel GC MVP definitely needs to be complete enough that they don't yawn and turn away even more pointedly.

titzer avatar Oct 11 '22 23:10 titzer

@chambart should chime in as well if he wants, but it sounds like the OCaml effort is at a very early stage.

I think the comparison to other optimizations like nested structures is reasonable, but the key differences are that we have a fully fleshed out semantics and engine and tools implementations for i31. We could certainly make the MVP more minimal by dropping it, but given how far along the feature is and given the concrete evidence that it will be used, I think that dropping it now would not have any benefit. It wouldn't let us ship anything else sooner and we don't need more time to discuss the design of i31.

tlively avatar Oct 12 '22 00:10 tlively

I think the comparison to other optimizations like nested structures is reasonable, but the key differences are that we have a fully fleshed out semantics and engine and tools implementations for i31.

The other key difference is that nested structures are at least an order of magnitude more complicated than i31, both in terms of surface area, concepts, semantics, but probably also (efficient) implementation.

rossberg avatar Oct 12 '22 05:10 rossberg

@chambart, I was talking to @eqrion and others about this the other day and we were wondering whether your OCaml port is using i31 for anything other than implementing OCaml integers. Are you using it to represent packed ADTs or anything like that as well?

tlively avatar Oct 28 '22 18:10 tlively

I'm the one working with @chambart on this. We are indeed using i31 for other things than implementing OCaml integers. We're compiling from the flambda representation of the compiler, where others things have already been translated to unboxed integers.

As we told you by email it's still at an early stage but we'll come back to you once we can present it at a gc meeting.

redianthus avatar Nov 24 '22 18:11 redianthus

@eqrion. do you still want to discuss this further or should we close out this issue?

tlively avatar Jan 23 '23 15:01 tlively

ping @eqrion

tlively avatar Mar 03 '23 20:03 tlively