StatsBase.jl
StatsBase.jl copied to clipboard
API for VIF
Variance Inflation Factor -- if we ever move StatisticalModel and RegressionModel to StatsModels, then this should go there, but until then, I'm guessing this is the best place. Next step would be implementation for GLM.jl and MixedModels.jl.
This is just a sketch, if anybody has a better idea, let me know.
@nalimilan I've tagged you as a reviewer since this is mostly an API design thing and you seem to have a really good (and tough :wink:) perspective on such things.
It would be good to prepare at least one implementation in a package before merging this, just to ensure the result suits our needs before committing to this API forever.
Yeah, I wanted to do this in GLM.jl and MixedModels.jl, but it's much easier to have methods in those packages be deprecated than here if we don't like the API.
@nalimilan I have a pretty general implementation over in MixedModelsExtras (https://github.com/palday/MixedModelsExtras.jl/pull/11) that I want to use as an API testing ground. Let me know if that API works for you. If so, we could migrate most of the code and method-name ownership to StatsBase/StatsAPI for vif and to StatsModels for gvif (which depends on having the model formula and hence needs TableRegressionModel).
@ararslan You needed this?
Yes?
@nalimilan I have a pretty general implementation over in MixedModelsExtras (palday/MixedModelsExtras.jl#11) that I want to use as an API testing ground. Let me know if that API works for you. If so, we could migrate most of the code and method-name ownership to StatsBase/StatsAPI for
vifand to StatsModels forgvif(which depends on having the model formula and hence needsTableRegressionModel).
Cool. Unfortunately this cannot live in StatsAPI as it calls StatsBase.cov2cor!. But we have moved almost all model-related functions out of StatsBase now, so vif would be kind of lonely there. Would it be OK for you to put it in StatsModels? Otherwise, we could put an empty definition in StatsAPI and have each package define a method for its custom type, even if that's suboptimal.
@nalimilan I would say put the [g]vif stub in wherever RegressionModel now lives and a general implementation in StatsModels.jl that other packages can optionally overload. Mostly I just want it to be as accessible as possible and since it was possible to implement vif using only functionality from StatsBase for a type owned by StatsBase, I thought that was good place for it. I think you have a better view of the entire ecosystem, so tell me where to put PRs and I'll open them. From my perspective, we have (noting that the stubs and default definitions don't necessarily need to appear in the same spots):
vifstub -- StatsAPI?vif(::RegressionModel)default definition -- StatsBase or StatsModels?gvifstub -- unsure, but probably same place asvifstubgvif(::RegressionModel; scale=false)default definition. Technically requires aformulamethod to be defined, whichTableRegressionModelandMixedModelboth have, but which isn't AFAIK defined for generalRegressionModel. So then StatsModels?termnames: should IMHO definitely be in StatsModels though @kleinschmidt may have thoughts and ideas about how to better define it / alternatives.
Yes?
Nothing, I just remembered that you were looking for something like this is think, so I thought I point it out.
Thanks, I appreciate it :smile:
Unfortunately this cannot live in StatsAPI as it calls
StatsBase.cov2cor!.
cov2cor! is actually from Statistics (Statistics.cov2cor! === StatsBase.cov2cor!), so AFAICT it should be fine in StatsAPI.
cov2cor!is actually from Statistics (Statistics.cov2cor! === StatsBase.cov2cor!), so AFAICT it should be fine in StatsAPI.
Ah yes good catch. Though we kind of have the plan to move Statistics out of the stdlib and merge it with StatsBase so soon the problem would be the same. Also, the definition of gvif is not completely trivial so it wouldn't be appropriate in StatsAPI.
@palday Your plan sounds fine. Let's put stubs in StatsAPI, and discuss with @kleinschmidt whether it would be appropriate to put definitions in StatsModels.
@palday Any plans to implement what you proposed? :-)
EDIT: most recent discussion happened at https://github.com/JuliaStats/GLM.jl/issues/428, let's continue there
closing in favor of https://github.com/JuliaStats/StatsAPI.jl/pull/26