Onda.jl
Onda.jl copied to clipboard
Deserializing Arrow tables with nested `Samples` fails on newer Arrow's
ERROR: MethodError: Cannot `convert` an object of type
Samples{Base.ReshapedArray{UInt8, 2, SubArray{UInt8, 1, Arrow.Primitive{UInt8, Vector{UInt8}}, Tuple{UnitRange{Int64}}, true}, Tuple{}}} to an object of type
Samples{Matrix{UInt8}}
Closest candidates are:
convert(::Type{T}, ::Intervals.AnchoredInterval{P, T}) where {P, T}
@ Intervals ~/.julia/packages/Intervals/V53WF/src/anchoredinterval.jl:181
convert(::Type{T}, ::Intervals.Interval{T}) where T
@ Intervals ~/.julia/packages/Intervals/V53WF/src/interval.jl:240
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
...
Stacktrace:
[1] setindex!(A::Vector{Samples{Matrix{…}}}, x::Samples{Base.ReshapedArray{UInt8, 2, SubArray{…}, Tuple{}}}, i1::Int64)
@ Base ./array.jl:1021
[2] copyto_unaliased!(deststyle::IndexLinear, dest::Vector{…}, srcstyle::IndexLinear, src::Arrow.Struct{…})
@ Base ./abstractarray.jl:1088
[3] copyto!(dest::Vector{Samples{…}}, src::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)})
@ Base ./abstractarray.jl:1068
[4] copymutable(a::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)})
@ Base ./abstractarray.jl:1202
[5] copy
@ ./abstractarray.jl:1145 [inlined]
[6] _preprocess_column(col::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)}, len::Int64, copycols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:243
I believe since Arrow started returning these views.
I think it's kinda reasonable to define a convert that forwards to the convert for the inner matrix:
function Base.convert(::Type{Onda.Samples{Matrix{T}}}, obj::Onda.Samples{<:Base.ReshapedArray{T, 2, <:SubArray{T}}}) where {T <: Number}
(; data, info, encoded) = obj
return Samples(convert(Matrix{T}, data), info, encoded)
end
Note Matrix{T}
is indeed the deserialization target: https://github.com/beacon-biosignals/Onda.jl/blob/dbeab57ef71939c134e06a527100a0db383be98d/src/samples.jl#L581
I don't think the convert
method needs to be that specific. IMO a more generally useful method to define would be
function Base.convert(::Type{Samples{T}}, samples::Samples) where {T}
(; data, info, encoded) = samples
info = Legolas.record_merge(info; sample_type=eltype(T))
return Samples(convert(T, data), info, encoded)
end
That allows you to convert the incoming sample data to any target type.
That makes sense to me, except the Legolas.record_merge(info; sample_type=eltype(T))
part. Shouldn't sample_type
in the info
field be about the serialized type, not the deserialized one?
Good question, I don't know, I could see it being useful either way.