Nullable fields don't always need Union{Missing, T}
I'm trying to implement the GeoArrow spec, which gives back coordinates in a deeply nested list of a FixedList (a point). Because these lists are theoretically nullable, in Julia we get an deeply nested list with Unions of Missing, even though these vectors contain no missings. An example for a column of LineStrings (there are geometry types that require two more levels of nesting):
2-element Arrow.List{Vector{Union{Missing, Vector{Union{Missing, Tuple{Float64, Float64}}}}}
It's pretty hard to convert these elements to a concrete Vector{Vector{NTuple, Float64}} without allocating. Is there a way to edit the view to be non missing? An alternative way would be to pass all(validitybitmap) in build to juliaeltype, so we only set Missing when there are actual missing values.
I'm happy to make a PR if there's consensus on what to do.
Might be related to #373.
We recently updated the Arrow.List type to return a SubArray into the underlying data array; does that help your overall issue here w/ the allocations?
Yeah, we could potentially check the validitybitmap to see if there are any missings before building the eltype, but it does make me a tad nervous for some unrelated side effects it might introduce.
I'd say let's go for a PR and then we can take a look at how much work this would actually be.
I don't think it's fixed:
julia> col1 = Vector{Union{Int64, String}}[
["one", 2],
["one", 2, 3],
["one", 2, 3, 4],
["one", 2, 3, 4, 5]];
julia> df = DataFrame(;col1)
4×1 DataFrame
Row │ col1
│ Array…
─────┼───────────────────────────────────
1 │ Union{Int64, String}["one", 2]
2 │ Union{Int64, String}["one", 2, 3]
3 │ Union{Int64, String}["one", 2, 3…
4 │ Union{Int64, String}["one", 2, 3…
julia> a = tempname()
"/tmp/jl_IngNyJwngp"
julia> Arrow.write(a, df)
"/tmp/jl_IngNyJwngp"
julia> Arrow.Table(a)
Arrow.Table with 4 rows, 1 columns, and schema:
:col1 … SubArray{Union{Missing, Int64, String}, 1, Arrow.DenseUnion{Union{Missing, Int64, String}, Arrow.UnionT{Arrow.Flatbuf.UnionMode.Dense, nothing, Tuple{Union{Missing, Int64}, String}}, Tuple{Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}, Arrow.List{String, Int32, Vector{UInt8}}}}, Tuple{UnitRange{Int64}}, true}
julia> Arrow.Table(a).col1[1]
2-element view(::Arrow.DenseUnion{Union{Missing, Int64, String}, Arrow.UnionT{Arrow.Flatbuf.UnionMode.Dense, nothing, Tuple{Union{Missing, Int64}, String}}, Tuple{Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}, Arrow.List{String, Int32, Vector{UInt8}}}}, 1:2) with eltype Union{Missing, Int64, String}:
"one"
2