formulaic
formulaic copied to clipboard
Add support for nested formulae (useful e.g. in IV contexts).
Hi @bashtage ,
Persuant to #24, I did a quick draft of additional support for IV-like formula in formulaic (in addition to the multi-part formula that was already implemented). There are some bugs and rough edges, but would you mind taking a look and adding any suggestions? I'm also not sure whether this should be a plugin or part of the default stack, so your thoughts there would be helpful too. All naming/etc is in draft status, so you can feel free to suggest improvements there.
Suppose you wanted to model some data using IV. With these patches you could write:
>>> from formulaic import Formula
>>> Formula("y ~ x1 + x2 + [ x3 + x4 ~ z1 + z2]")
.lhs:
y
.rhs:
root:
1 + x1 + x2 + x3_hat + x4_hat
.deps:
[0]:
.lhs:
x3 + x4
.rhs:
1 + z1 + z2
The resulting formula could then be parsed by the consumer of the formula to do the right things.
If you end up using an interaction term, or later multiplying, formulaic still does the right thing.
>>> formulaic.Formula("y ~ x0 + [ x1:x2 ~ z1 + z2 ] : x3")
.lhs:
y
.rhs:
root:
1 + x0 + x1:x2_hat:x3
.deps:
[0]:
.lhs:
x1:x2
.rhs:
1 + z1 + z2
The x1:x2_hat
is considered one factor, and looked up by name.
Note that this could also (with a small amount of effort) also be used for double ML (if we add a delta
transform/operator), and more general things like:
>>> formulaic.Formula("y ~ x1 + x2 + [ x2 + x3 ~ z1 + z2 ] + [ x4 ~ z3 + [z4 ~ a1 + a2 ] ]")
.lhs:
y
.rhs:
root:
1 + x1 + x2 + x2_hat + x3_hat + x4_hat
.deps:
[0]:
.lhs:
x2 + x3
.rhs:
1 + z1 + z2
[1]:
.lhs:
x4
.rhs:
root:
1 + z3 + z4_hat
.deps:
[0]:
.lhs:
z4
.rhs:
1 + a1 + a2
Though this does stress credulity a bit.
Lastly, I plan to add some utility methods to Formulaic to allow easy recursive iteration over the formula to assist with the evaluation of dependencies and updating of the dataframe as you go up the tree. This might even be able to be integrated into the high-level tooling, if so desired, with the user passing a dep_data_resolver
hook of some description.
closes: #24
Codecov Report
Attention: Patch coverage is 45.45455%
with 6 lines
in your changes are missing coverage. Please review.
Project coverage is 99.75%. Comparing base (
c064ed3
) to head (891c31a
). Report is 3 commits behind head on main.
:exclamation: Current head 891c31a differs from pull request most recent head 5b88650. Consider uploading reports for the commit 5b88650 to get more accurate results
Files | Patch % | Lines |
---|---|---|
formulaic/parser/parser.py | 14.28% | 6 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #108 +/- ##
===========================================
- Coverage 100.00% 99.75% -0.25%
===========================================
Files 53 39 -14
Lines 2850 2425 -425
===========================================
- Hits 2850 2419 -431
- Misses 0 6 +6
Flag | Coverage Δ | |
---|---|---|
unittests | 99.75% <45.45%> (-0.25%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@bashtage Any thoughts on this before it gets merged?
@s3alfisc: I just saw your project @ https://github.com/s3alfisc/pyfixest to implement fixest for Python. That looks awesome. I had some internal work that did IV based on this PR, but I was wondering whether you would be interested in having this support too?
Hi Matthew - yes, I'd definitely be interested in that! Right now I do a lot of string parsing to get the two formulas for first and second stage and call 'model_matrix' twice. Likely not very efficient and clearly not too elegant, but it works =) please let me know if I can be of any help in testing & debugging this PR!
They syntax looks good to me. I will definitely switch from my own so-so parser to this.
Thanks for buying in @bashtage and @s3alfisc . It's about time I got this in. I'll rebase it on the latest code-base and let you know when it is ready for you to test.