StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

API for VIF

Open palday opened this issue 4 years ago • 10 comments

Variance Inflation Factor -- if we ever move StatisticalModel and RegressionModel to StatsModels, then this should go there, but until then, I'm guessing this is the best place. Next step would be implementation for GLM.jl and MixedModels.jl.

This is just a sketch, if anybody has a better idea, let me know.

@nalimilan I've tagged you as a reviewer since this is mostly an API design thing and you seem to have a really good (and tough :wink:) perspective on such things.

palday avatar Jul 13 '21 22:07 palday

It would be good to prepare at least one implementation in a package before merging this, just to ensure the result suits our needs before committing to this API forever.

Yeah, I wanted to do this in GLM.jl and MixedModels.jl, but it's much easier to have methods in those packages be deprecated than here if we don't like the API.

palday avatar Jul 21 '21 17:07 palday

@nalimilan I have a pretty general implementation over in MixedModelsExtras (https://github.com/palday/MixedModelsExtras.jl/pull/11) that I want to use as an API testing ground. Let me know if that API works for you. If so, we could migrate most of the code and method-name ownership to StatsBase/StatsAPI for vif and to StatsModels for gvif (which depends on having the model formula and hence needs TableRegressionModel).

palday avatar Feb 18 '22 05:02 palday

@ararslan You needed this?

mschauer avatar Feb 18 '22 08:02 mschauer

Yes?

ararslan avatar Feb 18 '22 17:02 ararslan

@nalimilan I have a pretty general implementation over in MixedModelsExtras (palday/MixedModelsExtras.jl#11) that I want to use as an API testing ground. Let me know if that API works for you. If so, we could migrate most of the code and method-name ownership to StatsBase/StatsAPI for vif and to StatsModels for gvif (which depends on having the model formula and hence needs TableRegressionModel).

Cool. Unfortunately this cannot live in StatsAPI as it calls StatsBase.cov2cor!. But we have moved almost all model-related functions out of StatsBase now, so vif would be kind of lonely there. Would it be OK for you to put it in StatsModels? Otherwise, we could put an empty definition in StatsAPI and have each package define a method for its custom type, even if that's suboptimal.

nalimilan avatar Feb 18 '22 21:02 nalimilan

@nalimilan I would say put the [g]vif stub in wherever RegressionModel now lives and a general implementation in StatsModels.jl that other packages can optionally overload. Mostly I just want it to be as accessible as possible and since it was possible to implement vif using only functionality from StatsBase for a type owned by StatsBase, I thought that was good place for it. I think you have a better view of the entire ecosystem, so tell me where to put PRs and I'll open them. From my perspective, we have (noting that the stubs and default definitions don't necessarily need to appear in the same spots):

  1. vif stub -- StatsAPI?
  2. vif(::RegressionModel) default definition -- StatsBase or StatsModels?
  3. gvif stub -- unsure, but probably same place as vif stub
  4. gvif(::RegressionModel; scale=false) default definition. Technically requires a formula method to be defined, which TableRegressionModel and MixedModel both have, but which isn't AFAIK defined for general RegressionModel. So then StatsModels?
  5. termnames: should IMHO definitely be in StatsModels though @kleinschmidt may have thoughts and ideas about how to better define it / alternatives.

palday avatar Feb 18 '22 21:02 palday

Yes?

Nothing, I just remembered that you were looking for something like this is think, so I thought I point it out.

mschauer avatar Feb 18 '22 22:02 mschauer

Thanks, I appreciate it :smile:

ararslan avatar Feb 18 '22 22:02 ararslan

Unfortunately this cannot live in StatsAPI as it calls StatsBase.cov2cor!.

cov2cor! is actually from Statistics (Statistics.cov2cor! === StatsBase.cov2cor!), so AFAICT it should be fine in StatsAPI.

ararslan avatar Feb 18 '22 22:02 ararslan

cov2cor! is actually from Statistics (Statistics.cov2cor! === StatsBase.cov2cor!), so AFAICT it should be fine in StatsAPI.

Ah yes good catch. Though we kind of have the plan to move Statistics out of the stdlib and merge it with StatsBase so soon the problem would be the same. Also, the definition of gvif is not completely trivial so it wouldn't be appropriate in StatsAPI.

@palday Your plan sounds fine. Let's put stubs in StatsAPI, and discuss with @kleinschmidt whether it would be appropriate to put definitions in StatsModels.

nalimilan avatar Feb 19 '22 10:02 nalimilan

@palday Any plans to implement what you proposed? :-)

EDIT: most recent discussion happened at https://github.com/JuliaStats/GLM.jl/issues/428, let's continue there

nalimilan avatar Sep 01 '23 05:09 nalimilan

closing in favor of https://github.com/JuliaStats/StatsAPI.jl/pull/26

palday avatar Sep 05 '23 16:09 palday