fklearn
fklearn copied to clipboard
Causal Effect Bin Partitioners
Status
READY
Todo list
- [x] Documentation
- [x] Tests added and passed
Background context
To calculate causal effects by segments a quantile-based approach is used to create the segments. However, we've seen that this is usually not the ideal way to create them as there are methods that create segments that are more distinguishable between one another.
One such example is the Fisher-Jenks algorithm. A user could create its own partitioner with this algorithm like this:
import pandas as pd
from jenkspy import jenks_breaks
from toolz import curry
from typing import List
@curry
def fisher_jenks_partitioner(series: pd.Series, segments: int) -> List:
bins = jenks_breaks(series, n_classes=segments)
bins[0] = -float("inf")
bins[-1] = float("inf")
return bins
And use it in effect_by_segment:
from fklearn.causal.effects import linear_effect
from fklearn.causal.validation.curves import effect_by_segment
df = pd.DataFrame(dict(
t=[1, 1, 1, 2, 2, 2, 3, 3, 3],
x=[1, 2, 3, 1, 2, 3, 1, 2, 3],
y=[1, 1, 1, 2, 3, 4, 3, 5, 7],
))
result = effect_by_segment(
df,
prediction="x",
outcome="y",
treatment="t",
segments=3,
effect_fn=linear_effect,
partition_fn=fisher_jenks_partitioner)
Or use another custom partitioner such as:
@curry
def bin_partitioner(series: pd.Series, segments: int = 1) -> List:
return [1, 4, 5]
Description of the changes proposed in the pull request
We're adding:
- an argument to the
effect_by_segmentfunction so a user can define the way the segments are created. - the
quantile_partitionerso the default behavior ofeffect_by_segmentis maintained. - a new
PartitionFnTypetype. - tests for
quantile_partitioner - documentation for the new
fklearn.causal.partitionersmodule
Related PRs
NA
Where should the reviewer start?
At the modifications we did in effect_by_segment and then to the quantile_partitioner definition.
Remaining problems or questions
We are not creating additional partitioners to the ones used by default because this would require more complex definitions or imports on new libraries (such as the Fisher-Jenks algorithm).
Codecov Report
Merging #216 (60760e0) into master (3cd7bec) will decrease coverage by
0.32%. The diff coverage is93.39%.
@@ Coverage Diff @@
## master #216 +/- ##
==========================================
- Coverage 94.69% 94.36% -0.33%
==========================================
Files 25 35 +10
Lines 1507 2131 +624
Branches 203 280 +77
==========================================
+ Hits 1427 2011 +584
- Misses 48 83 +35
- Partials 32 37 +5
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/fklearn/causal/validation/cate.py | 0.00% <0.00%> (ø) |
|
| src/fklearn/data/datasets.py | 100.00% <ø> (ø) |
|
| src/fklearn/tuning/parameter_tuners.py | 79.48% <ø> (ø) |
|
| src/fklearn/tuning/selectors.py | 90.47% <ø> (ø) |
|
| src/fklearn/validation/validator.py | 91.20% <88.88%> (-3.08%) |
:arrow_down: |
| src/fklearn/preprocessing/splitting.py | 95.00% <92.59%> (-0.84%) |
:arrow_down: |
| src/fklearn/causal/cate_learning/meta_learners.py | 93.02% <93.02%> (ø) |
|
| src/fklearn/training/calibration.py | 96.36% <94.73%> (-3.64%) |
:arrow_down: |
| src/fklearn/training/transformation.py | 93.95% <95.12%> (+0.02%) |
:arrow_up: |
| src/fklearn/validation/evaluators.py | 93.95% <96.82%> (+4.32%) |
:arrow_up: |
| ... and 20 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.