DataArrays.jl icon indicating copy to clipboard operation
DataArrays.jl copied to clipboard

`vcat` between Arrays and DataArrays

Open tshort opened this issue 10 years ago • 6 comments

The following seems broken:

julia> a = @data([1,NA])
2-element DataArray{Int64,1}:
 1  
  NA

julia> b = [3., 4.]
2-element Array{Float64,1}:
 3.0
 4.0

julia> [a,b]
4-element DataArray{Float64,1}:
 1.0
  NA
 3.0
 4.0

julia> [b,a]
ERROR: `convert` has no method matching convert(::Type{Float64}, ::NAtype)
 in setindex! at array.jl:307
 in setindex! at array.jl:345
 in cat_t at abstractarray.jl:730
 in vcat at abstractarray.jl:736

Both forms call vcat defined in abstractarray.jl. Without the NA, both forms work. Maybe the new Nullable approach will fix this.

tshort avatar Nov 29 '14 16:11 tshort

This is because vcat just picks the type of the first array. I think we'll need to overwrite vcat(::AbstractArray...) to return a DataArray/NullableArray if one of the passed arrays is a DataArray/NullableArray.

This also raises a concern. If you have:

a::NullableVector{Float64}
b::Vector{Float64}

Then is it okay if [b, a][1] returns a Nullable and not a Float64?

simonster avatar Dec 23 '14 01:12 simonster

Promoting AbstractArrays might get tricky, especially if you include SubArrays, PooledDataArrays, and other yet-to-be-defined AbstractArrays.

tshort avatar Dec 23 '14 01:12 tshort

I just made vcat(dfs) do basic container type promotion in this commit. (Not something that could reuse vcat(das) because of the filling-in-missing-columns step.) Anyway, it felt like what we'd ideally have was a function in Base that returned the DataType of the just the container returned by similar.

garborg avatar Dec 23 '14 02:12 garborg

Of possible interest is a proposed update to hcat/vcat in base:

https://github.com/JuliaLang/julia/pull/10155

I can't tell if it includes container promotion.

tshort avatar Feb 10 '15 21:02 tshort

So far, it doesn't. It still depends on a call to similar for one of the arguments. However, the redesigned code certainly makes it easier to change this behavior, and makes it easier to extend/redefine cat behaviour in packages. However, I am not sure how to proceed with container promotion, in te end you need to be able to infer a concrete type that you can easily instantiate with given size and element type.

Jutho avatar Feb 11 '15 16:02 Jutho

I feel like we need a standard function in base, say promote_container_type or promote_array_type, which would give you the type of the container you need to 1) create to combine two containers, or 2) to combine elements of different types together.

Example of 1: vcat(::NullableVector{Float64}, ::Vector{Float64}) should create a NullableArray{Float64} Example of 2: vcat(::Nullable{Float64}, ::Float64) should create a NullableArray{Float64}

This is not needed only in concatenation functions (hence the need for an exported function in Base): for example, to write a recode function which would accept any AbstractArray, and return another AbstractArray with some values replaced with others according to a series of Pair arguments. If the input is Int[1, 2] and you replace 2 with NA/Nullable{Int}(), the result has to be a DataArray{Int}/NullableArray{Int}.

This looks pretty simple to me: just have as a fallback for AbstractArray like this:

promote_array_type{T1 <: AbstractArray, T2 <: AbstractArray}(x::Type{T1}, y::Type{T2}) = Array{promote_type(eltype(x), eltype(y))}

And then special cases, e.g. for DataArray:

promote_array_type{T1 <: AbstractArray, T2 <: DataArray}(x::Type{T1}, y::Type{T2}) = DataArray{promote_type(eltype(x), eltype(y))}

How does that sound? Am I missing something?

nalimilan avatar Mar 16 '15 22:03 nalimilan