arrow-julia
arrow-julia copied to clipboard
(de)serialization of (U)Int128
Hey 👋
(U)Int128
are currently serialized by setting the bit_width=128
. That works perfectly fine with Julia, but as soon as serialized arrow batches are read from another language, such as Python or JS, they raise the error of having an unrecognized Int type
.
We are fixing this by defining custom (de)serialization code, but that very much looks like type piracy to us. We were wondering if instead, we should by default send them as e.g. two UInt64
s in a struct :) What do you think?
Here is what we are currently doing for UInt128
:
# Splits `UInt128` into their respective low and high
# bits, i.e. their least and most significant bits.
function _split(i::UInt128)
l, h = i % UInt64, (i >> 64) % UInt64
return (:low => l, :high => h)
end
_merge(l::UInt64, h::UInt64) = UInt128(l) + UInt128(h) << 64
ArrowTypes.ArrowType(::Type{UInt128}) = NamedTuple{(:low, :high)}
ArrowTypes.toarrow(i::UInt128) = _split(i)
ArrowTypes.arrowname(::Type{UInt128}) = Symbol("Julia.UInt128")
ArrowTypes.fromarrow(T::Type{<:UInt128}, low, high) = _merge(low, high)
ArrowTypes.JuliaType(::Val{Symbol("Julia.UInt128")}) = UInt128
Alternatively, we could split the UInt128
before writing it to a tuple before writing the arrow.