unison icon indicating copy to clipboard operation
unison copied to clipboard

Implement better support for primitive arrays

Open pchiusano opened this issue 7 months ago • 0 comments

Right now an Int array is implemented in terms of the underlying ByteArray type. But because a ByteArray may be sent to another machine with different endian-ness, the primitive functions that read and write an Int or Nat etc can't just read an Int as a single instruction and the current machine's natural endian-ness. Why? Well, imagine an array was written on the Alice node, then sent to Bob. Because Alice and Bob might use different endian-ness, bytes could get reversed if Bob just reads an Int from the byte array, assuming it's big-endian, if Alice wrote the Int in little-endian order.

The current implementation works around this by doing 8 separate ByteArray reads plus bit twiddling to assemble an Int, and vice versa for writes. This ensures the bytes are in a consistent endian-ness wherever the Int is read or written, but is obviously very slow.

(Another thing that kills performance is the use of non-Raw arrays. Since Unison doesn't yet do worker-wrapper optimizations, loops that work with sliced arrays are doing pattern matching on each access, which is not at all what you want.)

The fix is to introduce primitive array types as a first class thing. They can serialize / hash etc in network byte order, but in memory on the machine, they'll exist in whatever endian-ness is native to that machine. Reads and writes are now a single instruction instead of 8 separate operations plus bit twiddling.

Here is a sketch of API, which I would add as new builtins:

-- just showing mutable

builtin type mutable.UnboxedArray a
UnboxedArray.read : UnboxedArray a -> Nat ->{Exception} a
UnboxedArray.write : UnboxedArray a -> Nat -> a ->{Exception} ()
UnboxedArray.toBigEndian : UnboxedArray a -> mutable.ByteArray
UnboxedArray.toLittleEndian : UnboxedArray a -> mutable.ByteArray

UnboxedArray.Ints : mutable.ByteArray -> UnboxedArray Int
UnboxedArray.Floats : mutable.ByteArray -> UnboxedArray Float
UnboxedArray.Nats : mutable.ByteArray -> UnboxedArray Nat

I think that would be sufficient for the needed builtins, though I would use the new work by @dolio to provide more optimized implementations of various derived functions. (Like summing an array, sorting it, backpermuting, and so on)

pchiusano avatar Mar 30 '25 14:03 pchiusano