DataTables.jl
DataTables.jl copied to clipboard
Overwrites DataFrames describe function
I have a lot of situations where I need both DataFrames and DataTables loaded at the same time, e.g. I start out with:
using DataFrames, DataTables
Right now I always get a warning that DataTables overwrites describe from DataFrames, which is not ideal.
I guess the solution for this is to move the function definition in some common base package, and then both DataFrames and DataTables will add a method? Would that be AbstractTables? If so, could we maybe start with a really bare bones AbstractTables now, that only holds that one definition, and then later more stuff can be added?
Ideally describe would be removed from one or both packages, as it's more of a statistical function than a tabular data function. Maybe that could live in StatsModels at some point?
It's from StatsBase, right? https://github.com/JuliaData/DataTables.jl/blob/20c71d6d40b3b238e902189b8262ba2b2e679b31/src/abstractdatatable/abstractdatatable.jl#L373
Yes, but unless we want a dependency on AbstractTables in StatsBase (which I don't think we should do), we'd still have to define the generic describe method on tables elsewhere. That's why I suggested StatsModels.
I'm confused: why does using DataFrames and DataTables result in one's describe overwriting the other if they're both extending the method from StatsBase?
Ohhhhhhhhhhhhhhhhhhhhhhhh heh, DataFrames and DataTables both @reexport StatsBase. I bet that's it.
Both have this:
StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)
That is the first of three overwriting messages I'm getting.
And then there is:
function StatsBase.describe{T<:Number}(io, dv::AbstractArray{T})
function StatsBase.describe{T}(io, dv::AbstractArray{T})
in both. I guess those three methods should just move to StatsBase, right?
Assuming they don't contain code specific to Nullables and/or NA, yes, those methods should live in StatsBase. Good catch!
Well, they actually contain code that is Nullable and DataArray specific :) So I guess they really should dispatch on fewer types?
Maybe replace those abstract array methods with an non-exported method for single columns?
I think StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv) should just move to StatsBase as is.
A version of function StatsBase.describe{T<:Number}(io, nv::AbstractArray{T}) that doesn't handle missing values should also move to StatsBase. In DataTables there should be function StatsBase.describe{T<:Number}(io, nv::NullableArray{T}), and in DataFrames function StatsBase.describe{T<:Number}(io, nv::DataArray{T}).
For function StatsBase.describe{T}(io, nv::AbstractArray{T}) similar story.