Onda.jl icon indicating copy to clipboard operation
Onda.jl copied to clipboard

Deserializing Arrow tables with nested `Samples` fails on newer Arrow's

Open ericphanson opened this issue 11 months ago • 3 comments

ERROR: MethodError: Cannot `convert` an object of type 
  Samples{Base.ReshapedArray{UInt8, 2, SubArray{UInt8, 1, Arrow.Primitive{UInt8, Vector{UInt8}}, Tuple{UnitRange{Int64}}, true}, Tuple{}}} to an object of type 
  Samples{Matrix{UInt8}}

Closest candidates are:
  convert(::Type{T}, ::Intervals.AnchoredInterval{P, T}) where {P, T}
   @ Intervals ~/.julia/packages/Intervals/V53WF/src/anchoredinterval.jl:181
  convert(::Type{T}, ::Intervals.Interval{T}) where T
   @ Intervals ~/.julia/packages/Intervals/V53WF/src/interval.jl:240
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84
  ...

Stacktrace:
  [1] setindex!(A::Vector{Samples{Matrix{…}}}, x::Samples{Base.ReshapedArray{UInt8, 2, SubArray{…}, Tuple{}}}, i1::Int64)
    @ Base ./array.jl:1021
  [2] copyto_unaliased!(deststyle::IndexLinear, dest::Vector{…}, srcstyle::IndexLinear, src::Arrow.Struct{…})
    @ Base ./abstractarray.jl:1088
  [3] copyto!(dest::Vector{Samples{…}}, src::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)})
    @ Base ./abstractarray.jl:1068
  [4] copymutable(a::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)})
    @ Base ./abstractarray.jl:1202
  [5] copy
    @ ./abstractarray.jl:1145 [inlined]
  [6] _preprocess_column(col::Arrow.Struct{Samples{…}, Tuple{…}, (:data, :info, :encoded)}, len::Int64, copycols::Bool)
    @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:243

I believe since Arrow started returning these views.

I think it's kinda reasonable to define a convert that forwards to the convert for the inner matrix:

function Base.convert(::Type{Onda.Samples{Matrix{T}}}, obj::Onda.Samples{<:Base.ReshapedArray{T, 2, <:SubArray{T}}}) where {T <: Number}
    (; data, info, encoded) = obj
    return Samples(convert(Matrix{T}, data), info, encoded)
end

Note Matrix{T} is indeed the deserialization target: https://github.com/beacon-biosignals/Onda.jl/blob/dbeab57ef71939c134e06a527100a0db383be98d/src/samples.jl#L581

ericphanson avatar Mar 12 '24 14:03 ericphanson

I don't think the convert method needs to be that specific. IMO a more generally useful method to define would be

function Base.convert(::Type{Samples{T}}, samples::Samples) where {T}
    (; data, info, encoded) = samples
    info = Legolas.record_merge(info; sample_type=eltype(T))
    return Samples(convert(T, data), info, encoded)
end

That allows you to convert the incoming sample data to any target type.

ararslan avatar Apr 05 '24 16:04 ararslan

That makes sense to me, except the Legolas.record_merge(info; sample_type=eltype(T)) part. Shouldn't sample_type in the info field be about the serialized type, not the deserialized one?

ericphanson avatar Apr 05 '24 16:04 ericphanson

Good question, I don't know, I could see it being useful either way.

ararslan avatar Apr 05 '24 21:04 ararslan