StatsModels.jl
StatsModels.jl copied to clipboard
Indexable model
This is a very short PR that serves as a test for a feature that I think would be nice in StatsModels. It allows you to index a model by a Term.
It is motivated by Stata functionality like _b[`var'] which allows you to, get the beta coefficient for the column represented by var. This is really useful when making tables and graphs programatically.
My approach is to take in a model and a AbstractTerm. Then check if the AbstractTerm matches, roughly, something in the model. If it does, match, it returns a NamedTuple with the coefficient name, coefficient, and the standard error.
julia> t = (y = rand(100), x = rand(100), b = rand(Bool, 100));
julia> m = lm(@formula(y ~ x + x & b), t);
julia> getparams(m, Term(:x))
Note that in the last line Term(:x) is not a ContinuousTerm or CategoricalTerm, I just match the x.sym paramter.
Codecov Report
Merging #167 into master will decrease coverage by
2.5%. The diff coverage is0%.
@@ Coverage Diff @@
## master #167 +/- ##
==========================================
- Coverage 84.78% 82.28% -2.51%
==========================================
Files 9 9
Lines 493 508 +15
==========================================
Hits 418 418
- Misses 75 90 +15
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/statsmodel.jl | 68.91% <0%> (-17.53%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 0e59b8e...1364d83. Read the comment docs.
I think this might begin to address a larger problem we have. Although we have the ability to go from a formula to a model, we don't have a way to do the reverse. I know this is difficult to ensure because it's up to packages like GLM.jl and MixedModels.jl to support this sort of behavior, but if we want to be able to plot the results of lm(@formula(y ~ x + x & b), t) then we need to be able to parse out the relevant terms as intercepts, slopes, etc.
I like this idea generally, and it's related to other discussions about overhauling the modeling API (e.g., requiring that StatisticalModels keep the formula themselves, instead of relying on the wrapper type like we currently do). #32 I think is the relevant issue...
Presumably this would be a fall back, returning a Named Tuple for just the statistics models are required to have, if any, right? Most packages would have to write their own getparams function.
After looking at #32 it seems like we want to move towards the formula interface and less so TableModels. Would it make sense to have a getparams for formulas so that it could be model agnostic? This wouldn't necessarily preclude the current PR, but would instead make it easier to extend to new types of terms.
Related issue is https://github.com/JuliaStats/StatsModels.jl/issues/111, which would be extremely useful.
I'm reviving this.
David do you have anything in mind for how exactly this should work? The inter-linking of PRs and Issues here suggests this problem is likely in a chicken-or-the-egg state.
Is there concrete groundwork that really has to be done before we can implement this feature more?
What would this look like for FormulaTerms? Random effects in MixedModels.jl are initially parsed as FormulaTerms.
What would this look like for
FormulaTerms? Random effects in MixedModels.jl are initially parsed asFormulaTerms.
I don't know! Probably something like
julia> getparams(m, Term(RandomEffect, :x: ))
But my knowledge of all of this is pretty weak.
@Tokazama I agree that #111 would be super useful here. It would make it a lot easier to index into a model.
Just realized that R's outputs are indexable via heavy use of named arrays
r$> confint(m_ols)
2.5 % 97.5 %
(Intercept) 3.8836484 5.5022774
exprop 0.3584959 0.6164465
latitude -0.2919431 2.3197287
r$> confint(m_ols)["exprop", "2.5 %"]
[1] 0.3584959
I regret having let this languish. Hopefully I can pick it up soon.