StatsModels.jl icon indicating copy to clipboard operation
StatsModels.jl copied to clipboard

Indexable model

Open pdeffebach opened this issue 5 years ago • 10 comments
trafficstars

This is a very short PR that serves as a test for a feature that I think would be nice in StatsModels. It allows you to index a model by a Term.

It is motivated by Stata functionality like _b[`var'] which allows you to, get the beta coefficient for the column represented by var. This is really useful when making tables and graphs programatically.

My approach is to take in a model and a AbstractTerm. Then check if the AbstractTerm matches, roughly, something in the model. If it does, match, it returns a NamedTuple with the coefficient name, coefficient, and the standard error.

julia> t = (y = rand(100), x = rand(100), b = rand(Bool, 100));
julia> m = lm(@formula(y ~ x + x & b), t);
julia> getparams(m, Term(:x)) 

Note that in the last line Term(:x) is not a ContinuousTerm or CategoricalTerm, I just match the x.sym paramter.

pdeffebach avatar Jan 04 '20 17:01 pdeffebach

Codecov Report

Merging #167 into master will decrease coverage by 2.5%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   84.78%   82.28%   -2.51%     
==========================================
  Files           9        9              
  Lines         493      508      +15     
==========================================
  Hits          418      418              
- Misses         75       90      +15
Impacted Files Coverage Δ
src/statsmodel.jl 68.91% <0%> (-17.53%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0e59b8e...1364d83. Read the comment docs.

codecov-io avatar Jan 04 '20 17:01 codecov-io

I think this might begin to address a larger problem we have. Although we have the ability to go from a formula to a model, we don't have a way to do the reverse. I know this is difficult to ensure because it's up to packages like GLM.jl and MixedModels.jl to support this sort of behavior, but if we want to be able to plot the results of lm(@formula(y ~ x + x & b), t) then we need to be able to parse out the relevant terms as intercepts, slopes, etc.

Tokazama avatar Jan 06 '20 02:01 Tokazama

I like this idea generally, and it's related to other discussions about overhauling the modeling API (e.g., requiring that StatisticalModels keep the formula themselves, instead of relying on the wrapper type like we currently do). #32 I think is the relevant issue...

kleinschmidt avatar Jan 06 '20 17:01 kleinschmidt

Presumably this would be a fall back, returning a Named Tuple for just the statistics models are required to have, if any, right? Most packages would have to write their own getparams function.

pdeffebach avatar Jan 06 '20 17:01 pdeffebach

After looking at #32 it seems like we want to move towards the formula interface and less so TableModels. Would it make sense to have a getparams for formulas so that it could be model agnostic? This wouldn't necessarily preclude the current PR, but would instead make it easier to extend to new types of terms.

Tokazama avatar Jan 08 '20 15:01 Tokazama

Related issue is https://github.com/JuliaStats/StatsModels.jl/issues/111, which would be extremely useful.

Tokazama avatar Jan 16 '20 14:01 Tokazama

I'm reviving this.

David do you have anything in mind for how exactly this should work? The inter-linking of PRs and Issues here suggests this problem is likely in a chicken-or-the-egg state.

Is there concrete groundwork that really has to be done before we can implement this feature more?

pdeffebach avatar Mar 13 '20 17:03 pdeffebach

What would this look like for FormulaTerms? Random effects in MixedModels.jl are initially parsed as FormulaTerms.

palday avatar Mar 15 '20 15:03 palday

What would this look like for FormulaTerms? Random effects in MixedModels.jl are initially parsed as FormulaTerms.

I don't know! Probably something like

julia> getparams(m, Term(RandomEffect, :x: )) 

But my knowledge of all of this is pretty weak.

@Tokazama I agree that #111 would be super useful here. It would make it a lot easier to index into a model.

pdeffebach avatar Apr 11 '20 17:04 pdeffebach

Just realized that R's outputs are indexable via heavy use of named arrays

r$> confint(m_ols)                                                                                        
                 2.5 %    97.5 %
(Intercept)  3.8836484 5.5022774
exprop       0.3584959 0.6164465
latitude    -0.2919431 2.3197287

r$> confint(m_ols)["exprop", "2.5 %"]                                                                     
[1] 0.3584959

I regret having let this languish. Hopefully I can pick it up soon.

pdeffebach avatar Nov 17 '20 19:11 pdeffebach