StatsModels.jl icon indicating copy to clipboard operation
StatsModels.jl copied to clipboard

remove mean/var/min/max

Open matthieugomez opened this issue 3 years ago • 10 comments

solves https://github.com/JuliaStats/StatsModels.jl/issues/222

matthieugomez avatar Apr 13 '21 15:04 matthieugomez

Seems potentially quite breaking, as I imagine that some packages rely on being able to access the precomputed summary statistics for continuous terms.

ararslan avatar Apr 13 '21 15:04 ararslan

Right, it should be in the 0.7 version. That being said, do you have any example of an external package relying on these summary stats?

matthieugomez avatar Apr 13 '21 16:04 matthieugomez

Nope. GLM or MixedModels might but I don't know for certain, which is why I said "potentially." Perhaps nothing is using it and it would be safe to remove, but we should just be aware of the breakage potential.

ararslan avatar Apr 13 '21 18:04 ararslan

I have some internal code that uses this. It might not be strictly necessary, but ideally I'd like to make sure there's some kind of mechanism for custom term types to request summary stats to be extracted from the data table at schema creation time. That's currently only possible via the hints Dict which is fine for things where you've manually specified the special handling but wouldn't work for things like splines implemented as functions in the formula (although those don't work very well with the current system either since ideally you need more than just mean/var/min/max for that).

It's good to know that the schema is a performance bottleneck though.

kleinschmidt avatar Apr 13 '21 23:04 kleinschmidt

Actually, another strategy that is actually used in #183 is to replace ContinuousTerm in some cases with plain Terms, which in that PR simply pass through the underlying values without changing. In the vast majority of cases this means that ContinuousTerm can just be eliminated.

kleinschmidt avatar Apr 15 '21 17:04 kleinschmidt