DataArrays.jl
DataArrays.jl copied to clipboard
`vcat` between Arrays and DataArrays
The following seems broken:
julia> a = @data([1,NA])
2-element DataArray{Int64,1}:
1
NA
julia> b = [3., 4.]
2-element Array{Float64,1}:
3.0
4.0
julia> [a,b]
4-element DataArray{Float64,1}:
1.0
NA
3.0
4.0
julia> [b,a]
ERROR: `convert` has no method matching convert(::Type{Float64}, ::NAtype)
in setindex! at array.jl:307
in setindex! at array.jl:345
in cat_t at abstractarray.jl:730
in vcat at abstractarray.jl:736
Both forms call vcat defined in abstractarray.jl. Without the NA, both forms work. Maybe the new Nullable approach will fix this.
This is because vcat just picks the type of the first array. I think we'll need to overwrite vcat(::AbstractArray...) to return a DataArray/NullableArray if one of the passed arrays is a DataArray/NullableArray.
This also raises a concern. If you have:
a::NullableVector{Float64}
b::Vector{Float64}
Then is it okay if [b, a][1] returns a Nullable and not a Float64?
Promoting AbstractArrays might get tricky, especially if you include SubArrays, PooledDataArrays, and other yet-to-be-defined AbstractArrays.
I just made vcat(dfs) do basic container type promotion in this commit. (Not something that could reuse vcat(das) because of the filling-in-missing-columns step.) Anyway, it felt like what we'd ideally have was a function in Base that returned the DataType of the just the container returned by similar.
Of possible interest is a proposed update to hcat/vcat in base:
https://github.com/JuliaLang/julia/pull/10155
I can't tell if it includes container promotion.
So far, it doesn't. It still depends on a call to similar for one of the arguments. However, the redesigned code certainly makes it easier to change this behavior, and makes it easier to extend/redefine cat behaviour in packages. However, I am not sure how to proceed with container promotion, in te end you need to be able to infer a concrete type that you can easily instantiate with given size and element type.
I feel like we need a standard function in base, say promote_container_type or promote_array_type, which would give you the type of the container you need to 1) create to combine two containers, or 2) to combine elements of different types together.
Example of 1: vcat(::NullableVector{Float64}, ::Vector{Float64}) should create a NullableArray{Float64}
Example of 2: vcat(::Nullable{Float64}, ::Float64) should create a NullableArray{Float64}
This is not needed only in concatenation functions (hence the need for an exported function in Base): for example, to write a recode function which would accept any AbstractArray, and return another AbstractArray with some values replaced with others according to a series of Pair arguments. If the input is Int[1, 2] and you replace 2 with NA/Nullable{Int}(), the result has to be a DataArray{Int}/NullableArray{Int}.
This looks pretty simple to me: just have as a fallback for AbstractArray like this:
promote_array_type{T1 <: AbstractArray, T2 <: AbstractArray}(x::Type{T1}, y::Type{T2}) = Array{promote_type(eltype(x), eltype(y))}
And then special cases, e.g. for DataArray:
promote_array_type{T1 <: AbstractArray, T2 <: DataArray}(x::Type{T1}, y::Type{T2}) = DataArray{promote_type(eltype(x), eltype(y))}
How does that sound? Am I missing something?