causalml icon indicating copy to clipboard operation
causalml copied to clipboard

Bart

Open caioguirado opened this issue 3 years ago • 4 comments

Proposed changes

This PR proposes the implementation of Bayesian Additive Regression Trees (BART) as an additional method to the package. The implementation allows the usage of BART for both a Classic ML problem setting and Uplift Modeling. The reson for including also the Classic ML setting was to allow easier validation of the method with synthetic data.

Currently the method works for regression and binary classification response types, and with binary treatment type.

References: [1] Chipman et al. (2010)
[2] Hill (2011)
[3] Kapelner and Bleich (2014)
[4] Tan and Roy (2019)
BartPy

Types of changes

What types of changes does your code introduce to CausalML? Put an x in the boxes that apply

  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • [x] I have read the CONTRIBUTING doc
  • [x] I have signed the CLA
  • [x] Lint and unit tests pass locally with my changes
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] I have added necessary documentation (if appropriate)
  • [ ] Any dependent changes have been merged and published in downstream modules

Further comments

Some next steps are proposed for improvement:

  • Add parallelization: In the example notebook added, there's a cProfile analysis of the methods that take the most time to execute inside the fit method. Both the computation of the individual tree residuals and prediction step are the top opportunities of improvement. They also have a very similar logic. Pratola et al. proposed a way of parallelizing it.

  • Add non-binary treatment support.

  • Add multi-class classification support.

  • Add MCMC statistics report and confidence intervals for BART predictions

caioguirado avatar Aug 24 '22 13:08 caioguirado

Thanks for the PR for BART, @caioguirado!

I will take a look further in details, but have two comments at the moments as follows:

  • [ ] Currently, it's failing at the Lint test. Could you please run black to reformat the code?
  • [ ] The test data set used in the example notebook looks too small. Could you please use more features e.g. ~10 with the mix of informative and non-informative features?

Thanks!

jeongyoonlee avatar Aug 26 '22 17:08 jeongyoonlee

Dropped some comments but overall looks great to me. The example notebook fails to render for me at the moment but I'll take a look if/when you've added more predictors as Jeong suggested.

t-tte avatar Aug 29 '22 23:08 t-tte

One of the BART features is supporting continuous treatment - it will be an excellent add to the package if this is supported. I'm wondering how much effort is needed to support continuous treatment.

Other than that, it will be great to compare BART's performance with some other models and show its advantage and value. Or maybe mention BART can support which scenarios that cannot be solved by other models in the current implementation.

zhenyuz0500 avatar Sep 09 '22 18:09 zhenyuz0500

@jeongyoonlee happy to take a look at this PR again too

ras44 avatar Nov 15 '23 18:11 ras44