component-model icon indicating copy to clipboard operation
component-model copied to clipboard

Fixed length arrays

Open LLFourn opened this issue 2 years ago • 24 comments

Rust has fixed length arrays whose type is written as [u8;32] for a 32-byte array. You can achieve this in the current wit specification by doing tuple<u8,u8 ... u8> (with 32 u8s). Could list<u8,32> or something be a first class type to avoid abusing tuples in this way even if it desugars to something similar?

LLFourn avatar Apr 11 '23 22:04 LLFourn

That does seem like a potentially good idea, and could be defined as another "specialization" (just like how tuple itself is a specialization of record). I feel like this was brought up in a previous issue or comment somewhere, but I can't seem to find it at the moment.

lukewagner avatar Apr 12 '23 22:04 lukewagner

A similar issue that came up earlier was IPv6 addresses, which would want to use [u16; 8].

sunfishcode avatar Apr 12 '23 23:04 sunfishcode

Oh right, that was it, thanks. I suppose mostly this is just a question of prioritization and whether we think there are enough use cases to motivate adding this in the MVP time frame.

lukewagner avatar Apr 13 '23 22:04 lukewagner

Just stumbled across this - I'd be very interested in these types for cryptographic hashes, in case there's a need for more motivating examples

cameron1024 avatar Apr 24 '23 18:04 cameron1024

Seconding @cameron1024 use case. I've been trying to understand wit, components, and wit-bindgen using the md5 algorithm. Being able to create md5.buffer: list<u8, 64> would be desireable.

f5yacobucci avatar Apr 26 '23 17:04 f5yacobucci

This actually happened to be my use-case too. I have ids for certain resources that are referenced by host and guest that are the output of cryptographic hashes so always have a fixed length. It's a bit of a PITA having to check the length and copy the bytes into a fixed length array on both sides.

LLFourn avatar Apr 27 '23 09:04 LLFourn

It does seem like there are some good use cases for this and it's not a big addition if this is just a specialization of tuple, so it does seem like a good idea to add. That being said, I think maybe we should hold off on adding this before the Preview 2 milestone since there is already a bunch of other loose ends to wrap up in the milestone. But after that, I'd be happy to propose adding it.

lukewagner avatar Apr 28 '23 19:04 lukewagner

For reference: https://github.com/WebAssembly/interface-types/issues/146. I think that issue was raised before the existence of tuples, though.

badeend avatar May 21 '23 09:05 badeend

I’m working on the Go bindings for WASI Preview 2 and the Component Model. The ipv4-address and ipv6-address types seem to be reasonable justification for fixed-with array types.

    type ipv4-address = tuple<u8, u8, u8, u8>;
    type ipv6-address = tuple<u16, u16, u16, u16, u16, u16, u16, u16>;

Could be:

    type ipv4-address = [8]u8;
    type ipv6-address = [8]u8;

(Go-style syntax)

It seems like a fixed-width array could just be syntactic sugar for tuple<t[, t...]> and despecialize into a record for the ABI.

ydnar avatar Jan 16 '24 21:01 ydnar

Is there any update on supporting this feature? I have several similar use-cases to those pointed out above as well.

arjunr2 avatar Feb 14 '24 18:02 arjunr2

Do you have a specific use-case that tuples can't handle?

badeend avatar Feb 14 '24 18:02 badeend

Tuples handle these use-cases but are rather messy to specify especially for larger fixed sizes. Something similar to @ydnar's solution would be useful.

arjunr2 avatar Feb 14 '24 19:02 arjunr2

Monotypic tuples implemented as Go arrays here:

https://github.com/ydnar/wasm-tools-go/pull/43/commits/be654cd4a90a85d3b8e11e7dbc764a50eae5c5a9

image

ydnar avatar Feb 14 '24 20:02 ydnar

Now that Preview 2 has shipped, I think it makes sense to add this as an emoji-gated feature (to be ungated once widely implemented in Preview 2 runtimes and included in a subgroup-approved 0.2.* release). I can write a PR for this in a bit if noone does it first.

Syntactically in WIT and the C-M, I'm leaning toward list<T, N> (where N is an optional second operand to the list type constructor), since it extends the existing syntax for a list instead of creating a whole new syntactic construct, analogous to how other languages extend their existing list/array syntax with the optional length.

lukewagner avatar Feb 15 '24 16:02 lukewagner

I think the form of list<T, N> might be confusing, implying that a fixed-length array would despecialize into a list<T>, when in fact it has a completely different ABI representation (pass-by-value, no header with ptr and len). Given that the 2nd arg isn’t a type parameter, but a length, perhaps Go-style [n]t ([8]u16) or Rust style [u16;8]?

Other possible benefits of implementing fixed-length arrays:

  • Monotypic tuple<T[,T...]> would despecialize into fixed-length arrays of type T.
  • flags with > 64 flag values would despecialize into fixed-length arrays of u32.

ydnar avatar Feb 15 '24 16:02 ydnar

We already have the Canonical ABI varying significantly for specializations (e.g. flags is a specialization of record) and also for a single type constructor based on the immediate arguments (e.g., the byte-width of variants varies by the concrete number of cases), so I think we're not doing anything new, ABI-wise, by having list<T, N> have a totally different representation from list<T>. At the lifted semantics level, I think it does make sense to think of a fixed-length list as just a list: depending on your language, you may be able to ignore the N and think of the list<T,N> as just a list<T> (e.g., in JS, a list<T,N> would naturally produce a JS array, just like a list<T>).

lukewagner avatar Feb 15 '24 17:02 lukewagner

  • Monotypic tuple<T[,T...]> would despecialize into fixed-length arrays of type T.

That's not what you want. Tuples are accessed with a static index and no runtime overhead, easily verifiable, whereas lists/arrays are accessed with a dynamic index and (usually) a runtime bounds check, at least in typed languages that distinguish the two. Different trade-offs that API designers supposedly pick consciously and an ABI should not mix up.

rossberg avatar Feb 16 '24 06:02 rossberg

  • Monotypic tuple<T[,T...]> would despecialize into fixed-length arrays of type T.

That's not what you want. Tuples are accessed with a static index and no runtime overhead, easily verifiable, whereas lists/arrays are accessed with a dynamic index and (usually) a runtime bounds check, at least in typed languages that distinguish the two. Different trade-offs that API designers supposedly pick consciously and an ABI should not mix up.

I think the gist of the OP (which I agree with) is that a fixed-length array is ABI identical to a tuple with a single type. Which isn’t the same as a list—as you correctly pointed out has a header, dynamic index, overhead, etc.

ydnar avatar Feb 16 '24 17:02 ydnar

No, a list/array with fixed length is still indexed with a dynamic index, like any other list/array, which requires a check, like for any other list/array. The only difference is that you statically know one side of that check. A tuple is a different concept with no dynamic indexing nor checking.

rossberg avatar Feb 16 '24 20:02 rossberg

This thread is about Rust and Go-style fixed-length arrays, which are value types, not a list<t> with a fixed length, which would be represented as a slice or vector in Rust or Go.

Therefore the memory representation of a fixed-length array is identical to a tuple with a single type.

ydnar avatar Feb 16 '24 21:02 ydnar

Ah, I think I understand the disconnect: @ydnar is primarily concerned with the Canonical ABI representation here, specifically that, whatever we call it, our fixed-length-array is given the same "inline" representation as the N-ary tuples that it is replacing (and not a (ptr, length) pair pointing to separately-allocated memory). I didn't mention it above, but I agree this is the representation we want and this is totally doable even if we call it list<T, N> (the Canonical ABI is allowed to have totally different representations based on the particular operands of the type constructor).

lukewagner avatar Feb 16 '24 21:02 lukewagner

PR up in #304, PTAL

lukewagner avatar Feb 19 '24 22:02 lukewagner

I would approciate fix length list (like mac address, tuple<u8, u8, u8, u8, u8, u8> is just messy), however I believe fixed length array and dynamic length array would be very different from the ABI perspective?

like std::array<T, N> and std::vector<T> in C++?

std::array is N consecutive Ts lives on stack, while std::vector is a pointer point to a heap allocated buffer(along with a size & capacity member).

dearfl avatar Apr 09 '24 03:04 dearfl

Yes, the ABIs (as defined in #304) are totally different.

lukewagner avatar Apr 09 '24 23:04 lukewagner