DataTables.jl icon indicating copy to clipboard operation
DataTables.jl copied to clipboard

Overwrites DataFrames describe function

Open davidanthoff opened this issue 8 years ago • 11 comments

I have a lot of situations where I need both DataFrames and DataTables loaded at the same time, e.g. I start out with:

using DataFrames, DataTables

Right now I always get a warning that DataTables overwrites describe from DataFrames, which is not ideal.

I guess the solution for this is to move the function definition in some common base package, and then both DataFrames and DataTables will add a method? Would that be AbstractTables? If so, could we maybe start with a really bare bones AbstractTables now, that only holds that one definition, and then later more stuff can be added?

davidanthoff avatar Mar 15 '17 18:03 davidanthoff

Ideally describe would be removed from one or both packages, as it's more of a statistical function than a tabular data function. Maybe that could live in StatsModels at some point?

ararslan avatar Mar 15 '17 18:03 ararslan

It's from StatsBase, right? https://github.com/JuliaData/DataTables.jl/blob/20c71d6d40b3b238e902189b8262ba2b2e679b31/src/abstractdatatable/abstractdatatable.jl#L373

kleinschmidt avatar Mar 15 '17 19:03 kleinschmidt

Yes, but unless we want a dependency on AbstractTables in StatsBase (which I don't think we should do), we'd still have to define the generic describe method on tables elsewhere. That's why I suggested StatsModels.

ararslan avatar Mar 15 '17 19:03 ararslan

I'm confused: why does using DataFrames and DataTables result in one's describe overwriting the other if they're both extending the method from StatsBase?

kleinschmidt avatar Mar 15 '17 19:03 kleinschmidt

Ohhhhhhhhhhhhhhhhhhhhhhhh heh, DataFrames and DataTables both @reexport StatsBase. I bet that's it.

ararslan avatar Mar 15 '17 19:03 ararslan

Both have this:

StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)

That is the first of three overwriting messages I'm getting.

davidanthoff avatar Mar 15 '17 19:03 davidanthoff

And then there is:

function StatsBase.describe{T<:Number}(io, dv::AbstractArray{T})
function StatsBase.describe{T}(io, dv::AbstractArray{T})

in both. I guess those three methods should just move to StatsBase, right?

davidanthoff avatar Mar 15 '17 20:03 davidanthoff

Assuming they don't contain code specific to Nullables and/or NA, yes, those methods should live in StatsBase. Good catch!

ararslan avatar Mar 15 '17 20:03 ararslan

Well, they actually contain code that is Nullable and DataArray specific :) So I guess they really should dispatch on fewer types?

davidanthoff avatar Mar 15 '17 20:03 davidanthoff

Maybe replace those abstract array methods with an non-exported method for single columns?

kleinschmidt avatar Mar 15 '17 20:03 kleinschmidt

I think StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv) should just move to StatsBase as is.

A version of function StatsBase.describe{T<:Number}(io, nv::AbstractArray{T}) that doesn't handle missing values should also move to StatsBase. In DataTables there should be function StatsBase.describe{T<:Number}(io, nv::NullableArray{T}), and in DataFrames function StatsBase.describe{T<:Number}(io, nv::DataArray{T}).

For function StatsBase.describe{T}(io, nv::AbstractArray{T}) similar story.

davidanthoff avatar Mar 15 '17 20:03 davidanthoff