gc MVP `array.copy`

@manoskouk proposed array.copy in https://github.com/WebAssembly/gc/pull/278#discussion_r810944560, but it was not included in that PR because it deserved separate discussion and a separate PR.

Would anyone object to adding array.copy to the MVP?

Jul 21 '22 17:07 tlively

We should either have a complete story for bulk operators on arrays, analogous to what we did for memories and tables, or defer. Let's avoid a one-off instruction.

As mentioned on the same comment, I find it a bit hard to argue that this is MVP material, so I'd tend to defer. Bulk instructions could be an immediate follow-up proposal, though.

Jul 22 '22 09:07 rossberg

A full suite of bulk operations sounds good to me. I agree that conceptually they're not "MVP," but they're also straightforward (edit: and useful!) enough that we would have no problem shipping them concurrently with the MVP. The overhead of forking a separate repo for this handful of instructions does not seem worth it to me.

Jul 22 '22 16:07 tlively

If somebody else volunteers to do all the work (especially tests, interpreter, and spec text), then I certainly wouldn't stand in the way. But I do worry that it would delay us quite a bit longer (right now it almost seems realistic that we can finish the MVP this year). Forking a repo is probably a minor effort in comparison. ;)

Jul 22 '22 17:07 rossberg

I'm ok with leaving out bulk array operations from the MVP.

Jul 22 '22 17:07 titzer

I take issue with "we should have a complete story, or nothing" arguments, they are fundamentally against the idea of an MVP. I'd also like to caution us against completionism: we very intentionally subscribe to an incremental design philosophy. Among other aspects, that means that we very intentionally standardize things that there is demand for, and postpone things that may potentially come in handy in the future, but nobody is asking for currently.

A "one-off instruction" might be mildly undesirable if it is somehow a concept of its own (though even that situation might not be a problem at all, sometimes it's fine/unavoidable to have groups of size 1). But if it is a first representative of a potential group of similar instructions that might follow it later (if and when demand for them emerges), then I think adding feature sets incrementally is not a problem but rather how we should be operating. There is plenty of precedent where we successfully did this, as one random example: the Wasm MVP had a very limited set of constant instructions, and now we have a first follow-on proposal that's intentionally still small, leaving further extensions to additional followups; clearly there's no reasonable argument to be made that "we should either have a complete story for constant instructions, or defer all of them". There have also been examples where we got this wrong; IIRC (might be wrong) it was the bulk-memory proposal where when we wanted to advance it to phase 4, we realized that one or two of the instructions in it that were added "for completeness" were actually not implemented by any toolchain because there was zero practical demand for them.

Long story short, I think the key question is whether array.copy provides enough of a performance advantage to be worth including right away (and I think the answer may well be "yes"; we can try to get updated concrete data). I don't think "we must either add every conceivable instruction right now, or none at all" is a reasonable rule to follow.

Jul 26 '22 10:07 jakobkummerow

I take issue with "we should have a complete story, or nothing" arguments, they are fundamentally against the idea of an MVP.

The concepts of MVP and incrementality don't mean that features should be considered at arbitrarily small granularity. That would be a mess.

It’s a bit odd to argue on the basis of MVP policy here when we already established that this case is stretching the definition of MVP. ;)

Wasm was viable for years without memory.copy, and this one should be even less essential. There are many features I would like to have right away, but at this point I'd rather have us focus on finalising than extending the scope of the MVP.

Jul 28 '22 12:07 rossberg

array.copy is distinct from any other bulk instruction in that it was specifically requested by compiler implementors targeting wasm-gc. To estimate its impact, I implemented array.copy as a loop of array.gets and array.sets in V8, and I am getting a 2.5-3% slowdown overall in real-world benchmarks. I think these two points are strong arguments towards including it as an one-of.

Nov 03 '22 16:11 manoskouk

I want to add couple of notes that could be useful for this discussion:

The 3% improvement here doesn't even reflect the full potential; we already found this week couple of places where the user code was utilizing batch APIs of the collection (i.e. using add instead of addAll) in a critical code path. In those large examples, it helped us to achieve 2x performance of JS (where the earlier state was slightly faster than JS).
One other obvious use case we have here is the fill operations (i.e. memset equivalent). I didn't bring that up earlier since I didn't have time to collect evidence around that however if you are looking for more scenarios, that could be good candidate.
I propose a modification to current instruction to make it more useful. Instead of having multiple instructions that requires types to be passed (e.g. array.copy int8.array int8.array), it would be more beneficial if it wasn't taking type parameters but instead just accepting arrayref. The reason behind that is the API we need to emulate here requires to operate on Java top type (System.arrayCopy accepts java.lang.Object and operates on any array types). I believe C# also has similar API that accepts the top Array type. The way we are working around this at the moment is via introducing a dynamic dispatch. We introduced a "copy" instance method to our array wrapper object where each array subtype overrides methods and targets the appropriate array.type instructions. Besides the regular cost of dynamic dispatches here, in WASM there is also additional overhead due to casts required on receiver types.

Nov 04 '22 03:11 gkdn

@rossberg, do you see any problems with @gkdn's suggestion that array.copy take a pair of arrayref parameters? Internally it would have to check that the source element type is a subtype of the destination element type, but I think being able to save a dynamic dispatch is worth that extra complexity.

Jan 24 '23 19:01 tlively

Yes. This assumes that all arrays will forever pay the cost for carrying complete runtime type information, which is an assumption that I'd absolutely like to avoid.

As a ground principle, I think we should not hide casts in other operations, because of this assumption and since it's already a complex operation with potentially unbounded cost in the future. Please let's not turn Wasm into a dynamic language.

Jan 25 '23 09:01 rossberg

Below are the (informal) semantics I would propose for the four new instructions. I'll upload a PR adding these to the MVP doc and adding tests for them soon, but I wanted to post them before that in case folks have any comments.

array.copy ht1 ht2 : [ref null ht1, i32, ref null ht2, i32, i32] -> []

    Where ht1 = array mut t1 and ht2 = array mut? t2 and t2 <: t1
      or t1 = t2 = i8
      or t1 = t2 = i16

 - Pop size from stack
 - Pop srcIndex from stack
 - Pop srcRef from stack
 - Pop destIndex from stack
 - Pop destRef from stack
 - If srcRef is null:
   - Trap
 - If destRef is null:
   - Trap
 - If srcIndex + size > srcRef array size
   - Trap
 - If destIndex + size > destRef array size
   - Trap
 - Fill destRef[destIndex:destIndex+size] with values from srcRef[srcIndex:srcIndex+size]

array.fill ht : [ref null ht, i32, t, i32] -> []

    Where ht = array t
      or ht = array i8 and t = i32
      or ht = array i16 and t = i32

 - Pop size from stack
 - Pop value from stack
 - Pop index from stack
 - Pop ref from stack
 - If ref is null:
   - Trap
 - If index + size > ref array size:
   - Trap
 - Fill ref[index:index+size] with value

array.init_data ht data : [ref null ht, i32, i32, i32] -> []

    Where ht = array mut t, where t is numeric, i8, or i16

 - Pop size from stack
 - Pop offset from stack
 - Pop index from stack
 - Pop ref from stack
 - If ref is null:
   - Trap
 - If index + size > ref array size:
   - Trap
 - If offset + size * bytes(t) > data size:
   - Trap
 - If data has been dropped
   - Trap
 - Fill ref[index:index+size] with values read from data[offset:offset+size*bytes(t)]

array.init_elem ht elem : [ref null ht, i32, i32, i32] -> []

    Where ht = array mut t, where t is a reftype

 - Pop size from stack
 - Pop offset from stack
 - Pop index from stack
 - Pop ref from stack
 - If ref is null:
   - Trap
 - If index + size > ref array size:
   - Trap
 - If offset + size > elem size:
   - Trap
 - If elem has been dropped
   - Trap
 - Fill ref[index:index+size] with values read from elem[offset:offset+size]

Mar 03 '23 05:03 tlively

LGTM. A couple of notational suggestions:

    Where ht = array t
      or ht = array i8 and t = i32
      or ht = array i16 and t = i32

You can express this as

    Where ht = array mut st and t = unpacked(st)

(Also, needs to be a mutable array.)

    Where ht1 = array mut t1 and ht2 = array mut? t2 and t2 <: t1
      or t1 = t2 = i8
      or t1 = t2 = i16

You can assume the subtyping relation is extended to storage types (was missing from the MVP doc, I just added it). So this could be:

    Where ht1 = array mut st1 and ht2 = array mut? st2 and st2 <: st1

 - Fill ref[index:index+size] with value

In the notation of the spec, you probably mean ref[index:size].

Mar 03 '23 07:03 rossberg

Great, thanks for the notes!

Mar 03 '23 16:03 tlively

There's almost a duality between the allocating and non-allocating bulk array operations:

Source	New array	Existing array
Single value	`array.new[_default]`	`array.fill`
Data segment	`array.new_data`	`array.init_data`
Element segment	`array.new_elem`	`array.init_elem`
Array	-	`array.copy`
Operand stack	`array.new_fixed`	-

Do we want to fill one or both of these gaps?

An array.new_copy instruction can definitely be useful for sublist-style operations. It will avoid first filling the array and then doing the copy, which is the only option currently.

An array.init_fixed instruction (writing multiple contiguous elements in a single instruction) would be more exotic. I have encountered a use case for it, in wanting to write a multi-byte value into a byte array. That use case also calls for a corresponding multi-element read instruction, though. Such instructions could potentially save some bounds checking overhead, but their benefit is admittedly more dubious.

Mar 07 '23 12:03 askeksa-google

Good points about the duality. array.new_copy seems useful, assuming it has performance benefits. Would you be able to gather performance data if we prototyped that instruction?

For array.init_fixed, I wonder if an optimizing engine would be able to get the same benefits by deduplicating bounds checks on a sequence of normal array writes. For the specific use case of reading multi-byte values out of byte arrays, it might make sense to have e.g. i32.array_load, f64.array_load, etc. that load from i8 arrays, but I would want to investigate that post-MVP.

Mar 07 '23 16:03 tlively

Good points about the duality. array.new_copy seems useful, assuming it has performance benefits. Would you be able to gather performance data if we prototyped that instruction?

Yes, I can write a benchmark to compare the performance of array.new_copy vs array.new_default + array.copy.

Mar 08 '23 10:03 askeksa-google

According to the spec for array.copy, if srcIndex == srcRef array size and size == 0 the operation will not trap (but it will trap if size == 0 and srcIndex is larger). Is this intended?

Mar 10 '23 11:03 manoskouk

Yes, this is intended and is the same way the current bulk memory and bulk table instructions work.

Mar 10 '23 16:03 tlively

Just FYI, I have an initial implementation in the reference interpreter of the four instructions described here and I'll make a PR once I've finished writing tests.

Two questions:

Do we have candidate opcodes for these instructions? Currently I've just picked random free opcodes for prototyping.
Do we also want array.new_copy and array.init_fixed?

Mar 26 '23 16:03 conrad-watt

@jakobkummerow or @manoskouk, does v8 have prototype opcodes selected for these instructions?

@conrad-watt, I'm also writing tests for these, so you can probably get away with landing your implementation with just a bare minimum of testing. Alternatively, if you've already sunk a bunch of time into the tests, then finishing would save me some work :)

Mar 27 '23 00:03 tlively

We use 0xfb18 for array.copy and 0xfb0f for array.fill. For the latter we are more flexible as it is not used by partners yet.

Mar 27 '23 06:03 manoskouk

Initial PR is up (https://github.com/WebAssembly/gc/pull/363)

@tlively I think I ended up writing quite a few tests, lmk if they're useful

Mar 27 '23 19:03 conrad-watt

Thanks! The tests look great.

Mar 27 '23 19:03 tlively

The Kotlin folks brought up a use case for copying between GC arrays and memory, which is for applications that need to get the same data into arrays for use from GC languages and into memory for use from e.g. Skia (compiled with Emscripten). We wouldn't want to delay the MVP to get these instructions in, though.

Apr 04 '23 18:04 tlively

#375 landed, so I'll close this as completed. Discussion of further post-MVP extensions can be had in #367.

May 15 '23 17:05 tlively

gc gc copied to clipboard

MVP `array.copy`

gc
gc copied to clipboard