arrow-julia
arrow-julia copied to clipboard
Serializing `Dict{String,Real}` result in garbage values
Serializing a Dict
which contains Bool
and Float64
values results in a Arrow generating garbage values:
julia> d = Dict("is_valid" => true,"probability" => 0.53495216)
Dict{String, Real} with 2 entries:
"is_valid" => true
"probability" => 0.534952
julia> t = Arrow.Table(Arrow.tobuffer((; value=[d])))
Arrow.Table with 1 rows, 1 columns, and schema:
:value Dict{String, Float64}
julia> t.value
1-element Arrow.Map{Dict{String, Float64}, Int32, Arrow.Struct{NamedTuple{(:key, :value), Tuple{String, Float64}}, Tuple{Arrow.List{String, Int32, Vector{UInt8}}, Arrow.Primitive{Float64, Vector{Float64}}}}}:
Dict("is_valid" => -6.6622794774424345e159, "probability" => 3.1e-322)
Note that pre-converting the values to Float64
doesn't result in this behaviour:
julia> d = Dict{String,Float64}("is_valid" => true,"probability" => 0.53495216)
Dict{String, Float64} with 2 entries:
"is_valid" => 1.0
"probability" => 0.534952
julia> t = Arrow.Table(Arrow.tobuffer((; value=[d])))
Arrow.Table with 1 rows, 1 columns, and schema:
:value Dict{String, Float64}
julia> t.value
1-element Arrow.Map{Dict{String, Float64}, Int32, Arrow.Struct{NamedTuple{(:key, :value), Tuple{String, Float64}}, Tuple{Arrow.List{String, Int32, Vector{UInt8}}, Arrow.Primitive{Float64, Vector{Float64}}}}}:
Dict("is_valid" => 1.0, "probability" => 0.53495216)
Yeah, similar to what we do w/ arrays, we should probably try to enforce the Dict valtype with the concrete_or_concreteunion
machinery in ArrowTypes.jl. Or we at least need a check in map.jl that it's a concrete type/union when serializing.
I noticed this has been fixed by the PR above. Will it be included in the release any time soon?
The fix from #305 was included in Arrow.jl v2.3+. I'll close this issue then, unless folks think there is more that should be done in this case.