formulaic Add support for nested formulae (useful e.g. in IV contexts).

Hi @bashtage ,

Persuant to #24, I did a quick draft of additional support for IV-like formula in formulaic (in addition to the multi-part formula that was already implemented). There are some bugs and rough edges, but would you mind taking a look and adding any suggestions? I'm also not sure whether this should be a plugin or part of the default stack, so your thoughts there would be helpful too. All naming/etc is in draft status, so you can feel free to suggest improvements there.

Suppose you wanted to model some data using IV. With these patches you could write:

>>> from formulaic import Formula
>>> Formula("y ~ x1 + x2 + [ x3 + x4 ~ z1 + z2]")
.lhs:
    y
.rhs:
    root:
        1 + x1 + x2 + x3_hat + x4_hat
    .deps:
        [0]:
            .lhs:
                x3 + x4
            .rhs:
                1 + z1 + z2

The resulting formula could then be parsed by the consumer of the formula to do the right things.

If you end up using an interaction term, or later multiplying, formulaic still does the right thing.

>>> formulaic.Formula("y ~ x0 + [ x1:x2 ~ z1 + z2 ] : x3")
.lhs:
    y
.rhs:
    root:
        1 + x0 + x1:x2_hat:x3
    .deps:
        [0]:
            .lhs:
                x1:x2
            .rhs:
                1 + z1 + z2

The x1:x2_hat is considered one factor, and looked up by name.

Note that this could also (with a small amount of effort) also be used for double ML (if we add a delta transform/operator), and more general things like:

>>> formulaic.Formula("y ~ x1 + x2 + [ x2 + x3 ~ z1 + z2 ] + [ x4 ~ z3 + [z4 ~ a1 + a2 ] ]")
.lhs:
    y
.rhs:
    root:
        1 + x1 + x2 + x2_hat + x3_hat + x4_hat
    .deps:
        [0]:
            .lhs:
                x2 + x3
            .rhs:
                1 + z1 + z2
        [1]:
            .lhs:
                x4
            .rhs:
                root:
                    1 + z3 + z4_hat
                .deps:
                    [0]:
                        .lhs:
                            z4
                        .rhs:
                            1 + a1 + a2

Though this does stress credulity a bit.

Lastly, I plan to add some utility methods to Formulaic to allow easy recursive iteration over the formula to assist with the evaluation of dependencies and updating of the dataframe as you go up the tree. This might even be able to be integrated into the high-level tooling, if so desired, with the user passing a dep_data_resolver hook of some description.

closes: #24

Sep 25 '22 21:09 matthewwardrop

Codecov Report

Attention: Patch coverage is 45.45455% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 99.75%. Comparing base (c064ed3) to head (891c31a). Report is 3 commits behind head on main.

:exclamation: Current head 891c31a differs from pull request most recent head 5b88650. Consider uploading reports for the commit 5b88650 to get more accurate results

Files	Patch %	Lines
formulaic/parser/parser.py	14.28%	6 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##              main     #108      +/-   ##
===========================================
- Coverage   100.00%   99.75%   -0.25%     
===========================================
  Files           53       39      -14     
  Lines         2850     2425     -425     
===========================================
- Hits          2850     2419     -431     
- Misses           0        6       +6

Flag	Coverage Δ
unittests	`99.75% <45.45%> (-0.25%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oct 05 '22 03:10 codecov[bot]

@bashtage Any thoughts on this before it gets merged?

Oct 05 '22 03:10 matthewwardrop

@s3alfisc: I just saw your project @ https://github.com/s3alfisc/pyfixest to implement fixest for Python. That looks awesome. I had some internal work that did IV based on this PR, but I was wondering whether you would be interested in having this support too?

Oct 12 '23 23:10 matthewwardrop

Hi Matthew - yes, I'd definitely be interested in that! Right now I do a lot of string parsing to get the two formulas for first and second stage and call 'model_matrix' twice. Likely not very efficient and clearly not too elegant, but it works =) please let me know if I can be of any help in testing & debugging this PR!

Oct 13 '23 16:10 s3alfisc

They syntax looks good to me. I will definitely switch from my own so-so parser to this.

Oct 14 '23 12:10 bashtage

Thanks for buying in @bashtage and @s3alfisc . It's about time I got this in. I'll rebase it on the latest code-base and let you know when it is ready for you to test.

Mar 08 '24 06:03 matthewwardrop

formulaic formulaic copied to clipboard

Add support for nested formulae (useful e.g. in IV contexts).

Codecov Report

formulaic
formulaic copied to clipboard