gc icon indicating copy to clipboard operation
gc copied to clipboard

i31ref conflicts with JavaScript-numbers encoded as tagged 31bit values

Open tebbi opened this issue 5 years ago • 6 comments

Suppose a JavaScript number in 31bit int range flows into Wasm as an anyref type. Then in Wasm, we can perform a dynamic type-check to see if it has type i31ref. As far as I see, the spec wants this check to fail. But the problem is that no matter how this is specified, it causes some issues in the implementation.

Option 1: If the check is specified to fail, this means that i31ref and JavaScript numbers need a different encoding, but there is no space in a 32bit word to encode two different kinds of 31bit integers in addition to pointers. For V8 (32bit or pointer compressed builds), this would mean that either Smis have to be boxed when they flow to Wasm or i31ref would have to be boxed, which makes i31ref pointless.

Option 2: If the check is specified to succeed, then even engines that don't have Smi's would have to convert in-range JS numbers to i31ref. Even in V8, this would require normalizing HeapNumber to Smi at the boundary when possible, because otherwise we would leak the unobservable difference between Smi and a HeapNumber storing a Smi-range value. This conversion only has to happen when we cast or check for i31ref.

Option 3: The only alternative would be to leave this unspecified, leading to implementation-defined behavior, which I guess we don't want.

I think a better alternative to i31ref would be to just expose JavaScript numbers as an anyref subtype in Wasm. Since all major engines implement JavaScript numbers in a way that they are unboxed for at least the 31bit range, using this type in Wasm would still result in the desired unboxed representation. i31ref as a subtype of JavaScript numbers could also work, this corresponds to Option 2.

tebbi avatar Feb 07 '20 09:02 tebbi

Interesting question. AFAICS, option 2 is the only viable one, for the reasons you mention. It makes instructions like ref.is_i31 and ref.as_i31 slightly more expensive in a JS embedding, but I think it's bearable. Interested to hear what others think.

I'm not sure I understand your last suggestion. Wasm is independent from JS, so it makes no sense to introduce a JS type into it. Nor does it avoid the overhead of option 2, because you'd still want an allocation-free i31ref type and need to define the interaction.

rossberg avatar Feb 07 '20 15:02 rossberg

FWIW, I've also been assuming option 2 (specifically saying that there would be a dynamic check in ToWebAssemblyValue(v, anyref) saying that in-range integral JS numbers become i31refs and the rest become host-valued anyrefs. I think the extra overhead for this check when the source JS value is a double should be negligible in the context of the overall JS-to-wasm trampoline.

lukewagner avatar Feb 07 '20 15:02 lukewagner

Wasm is independent from JS, so it makes no sense to introduce a JS type into it.

JavaScript numbers are just float64 values encoded in anyref, this could be useful for Wasm even without JavaScript, especially since they are partially or completely unboxed, especially on engines with NaN-tagging. It's pretty much the only way to benefit from NaN-tagging on engines that happen to have it, and even if not, having Smi semantics when it works and falling back to boxed numbers can still be better than always boxing. Apart from that, being able to inspect well-behaved JavaScript values, especially the primitives, from Wasm could improve the Wasm-JS interop story a lot, since then JS code could return primitive values other than numbers to Wasm and have them interpreted on the Wasm side, without any copying. I'd be curious where you draw the red line here, how important is JS interop performance vs JS independence?

tebbi avatar Feb 10 '20 13:02 tebbi

Yes, but nobody wants float semantics for their integers. ;) So that wouldn't replace a proper integer type or eliminate the issue you pointed out.

Floatrefs may be useful in their own right, but it's gonna be a difficult case to make that they are essential for the MVP.

rossberg avatar Feb 10 '20 13:02 rossberg

I suppose a fourth option, somewhat in the spirit of https://github.com/WebAssembly/meetings/blob/master/2020/presentations/2020-02-rossberg-ref-type.pdf and IIRC as was brought up in the discussion following that presentation, would be to split the current anyref's responsibilities into separate types (modulo bikeshedding the actual names, of course; using hopefully-descriptive labels here):

  • a "foreignref" that's opaque to Wasm code (in particular, can't be inspected or cast to anything else, only passed around)
  • a "gcref" that's the supertype of i31ref and the various struct/array types of the GC proposal.

That way, an engine could pass whatever it wants in whichever internal representation it happens to be using to "foreignref" values, and those implementation details would not be exposed to Wasm. The engine could choose its representation for i31ref values independently, and that may or may not be the same as what it uses for unboxed foreignref values.

jakobkummerow avatar Feb 18 '20 11:02 jakobkummerow

This does not seem to be a JS-specific problem; rather it seems relevant to any interlanguage exchange of integers, since not all language implementations even represent integers the same. Sounds like yet another challenge for Interface Types, one that in the meantime seems best treated by @jakobkummerow's suggestion of keeping foreign references entirely opaque (for now).

RossTate avatar Feb 19 '20 20:02 RossTate

We solved this by splitting externref into a separate type hierarchy and putting the range check on extern.internalize.

tlively avatar Nov 01 '22 18:11 tlively