arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

(de)serialization of (U)Int128

Open bachdavi opened this issue 2 years ago • 0 comments

Hey 👋

(U)Int128 are currently serialized by setting the bit_width=128. That works perfectly fine with Julia, but as soon as serialized arrow batches are read from another language, such as Python or JS, they raise the error of having an unrecognized Int type.

We are fixing this by defining custom (de)serialization code, but that very much looks like type piracy to us. We were wondering if instead, we should by default send them as e.g. two UInt64s in a struct :) What do you think?

Here is what we are currently doing for UInt128:

# Splits `UInt128` into their respective low and high
# bits, i.e. their least and most significant bits.
function _split(i::UInt128)
    l, h = i % UInt64, (i >> 64) % UInt64
    return (:low => l, :high => h)
end

_merge(l::UInt64, h::UInt64) = UInt128(l) + UInt128(h) << 64

ArrowTypes.ArrowType(::Type{UInt128}) = NamedTuple{(:low, :high)}
ArrowTypes.toarrow(i::UInt128) = _split(i)
ArrowTypes.arrowname(::Type{UInt128}) = Symbol("Julia.UInt128")
ArrowTypes.fromarrow(T::Type{<:UInt128}, low, high) = _merge(low, high)
ArrowTypes.JuliaType(::Val{Symbol("Julia.UInt128")}) = UInt128

Alternatively, we could split the UInt128 before writing it to a tuple before writing the arrow.

bachdavi avatar Apr 28 '22 09:04 bachdavi