DataArrays.jl
DataArrays.jl copied to clipboard
`vcat` between Arrays and DataArrays
The following seems broken:
julia> a = @data([1,NA])
2-element DataArray{Int64,1}:
1
NA
julia> b = [3., 4.]
2-element Array{Float64,1}:
3.0
4.0
julia> [a,b]
4-element DataArray{Float64,1}:
1.0
NA
3.0
4.0
julia> [b,a]
ERROR: `convert` has no method matching convert(::Type{Float64}, ::NAtype)
in setindex! at array.jl:307
in setindex! at array.jl:345
in cat_t at abstractarray.jl:730
in vcat at abstractarray.jl:736
Both forms call vcat
defined in abstractarray.jl. Without the NA
, both forms work. Maybe the new Nullable approach will fix this.
This is because vcat
just picks the type of the first array. I think we'll need to overwrite vcat(::AbstractArray...)
to return a DataArray/NullableArray if one of the passed arrays is a DataArray/NullableArray.
This also raises a concern. If you have:
a::NullableVector{Float64}
b::Vector{Float64}
Then is it okay if [b, a][1]
returns a Nullable and not a Float64?
Promoting AbstractArrays might get tricky, especially if you include SubArrays, PooledDataArrays, and other yet-to-be-defined AbstractArrays.
I just made vcat(dfs)
do basic container type promotion in this commit. (Not something that could reuse vcat(das)
because of the filling-in-missing-columns step.) Anyway, it felt like what we'd ideally have was a function in Base that returned the DataType of the just the container returned by similar
.
Of possible interest is a proposed update to hcat/vcat in base:
https://github.com/JuliaLang/julia/pull/10155
I can't tell if it includes container promotion.
So far, it doesn't. It still depends on a call to similar
for one of the arguments. However, the redesigned code certainly makes it easier to change this behavior, and makes it easier to extend/redefine cat behaviour in packages. However, I am not sure how to proceed with container promotion, in te end you need to be able to infer a concrete type that you can easily instantiate with given size and element type.
I feel like we need a standard function in base, say promote_container_type
or promote_array_type
, which would give you the type of the container you need to 1) create to combine two containers, or 2) to combine elements of different types together.
Example of 1: vcat(::NullableVector{Float64}, ::Vector{Float64})
should create a NullableArray{Float64}
Example of 2: vcat(::Nullable{Float64}, ::Float64)
should create a NullableArray{Float64}
This is not needed only in concatenation functions (hence the need for an exported function in Base): for example, to write a recode
function which would accept any AbstractArray
, and return another AbstractArray
with some values replaced with others according to a series of Pair
arguments. If the input is Int[1, 2]
and you replace 2
with NA
/Nullable{Int}()
, the result has to be a DataArray{Int}
/NullableArray{Int}
.
This looks pretty simple to me: just have as a fallback for AbstractArray
like this:
promote_array_type{T1 <: AbstractArray, T2 <: AbstractArray}(x::Type{T1}, y::Type{T2}) = Array{promote_type(eltype(x), eltype(y))}
And then special cases, e.g. for DataArray
:
promote_array_type{T1 <: AbstractArray, T2 <: DataArray}(x::Type{T1}, y::Type{T2}) = DataArray{promote_type(eltype(x), eltype(y))}
How does that sound? Am I missing something?