arrow-julia
arrow-julia copied to clipboard
Addtional `Missing` gets injected into Schema
In the following example, the column doesn't have Missing but after a round trip through Arrow it gained Missing in the Union
julia> rnt.vector_variant_int64_string
5-element RNTupleField{Vector{Union{Int64, String}}}:
Union{Int64, String}["one"]
Union{Int64, String}["one", 2]
Union{Int64, String}["one", 2, 3]
Union{Int64, String}["one", 2, 3, 4]
Union{Int64, String}["one", 2, 3, 4, 5]
julia> DataFrame(rnt).vector_variant_int64_string
5-element Vector{Vector{Union{Int64, String}}}:
["one"]
["one", 2]
["one", 2, 3]
["one", 2, 3, 4]
["one", 2, 3, 4, 5]
julia> Arrow.write(a, DataFrame(rnt))
"/tmp/jl_Lk5W1G92XO"
julia> Arrow.Table(a)
Arrow.Table with 5 rows, 13 columns, and schema:
:string String
:vector_int32 Vector{Int32} (alias for Array{Int32, 1})
:array_float Vector{Float32} (alias for Array{Float32, 1})
:vector_vector_int32 Vector{Vector{Int32}} (alias for Array{Array{Int32, 1}, 1})
:vector_string Vector{String} (alias for Array{String, 1})
:vector_vector_string Vector{Vector{String}} (alias for Array{Array{String, 1}, 1})
:variant_int32_string Union{Missing, Int32, String}
:vector_variant_int64_string Vector{Union{Missing, Int64, String}} (alias for Array{Union{Missing, Int64, String}, 1})
:tuple_int32_string NamedTuple{(:_0, :_1), Tuple{Int32, String}}
:pair_int32_string NamedTuple{(:_0, :_1), Tuple{Int32, String}}
:vector_tuple_int32_string Vector{NamedTuple{(:_0, :_1), Tuple{Int32, String}}} (alias for Array{NamedTuple{(:_0, :_1), Tuple{Int32, String}}, 1})
:lorentz_vector NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}
:array_lv Vector{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}} (alias for Array{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}, 1})
Hmmmm, yes, I think I remember that for the Union types, the arrow spec makes it hard because it always allows nulls, so we default to including Missing in the Union to account for this. We can/should figure out how to do this cleaner though.
I see, but for a column of Vector{Union{T, T2}} you don't need Missing right? because the empty element would just be a
Union{T,T2][]
for example, :vector_variant_int64_string
Correctness bug, bump?