causalml
causalml copied to clipboard
CausalRandomForestRegressor with causal_mse predicts to inf on data with nuisance
Describe the bug
After training the CausalRandomForestRegressor
with criterion causal_mse
on data with nuisance, many of the predicted ITE values are inf
.
To Reproduce
I changed the causal trees with synthetic data notebook to use data generated by simulate_nuisance_and_easy_treatment
# y, X, w, tau, b, e = synthetic_data(mode=5, n=10000, p=20, sigma=5.0)
from causalml.dataset import simulate_nuisance_and_easy_treatment
y, X, w, tau, b, e = simulate_nuisance_and_easy_treatment(n=10000, p=20, sigma=5.0)
after training the CausalRandomForestRegressor
with criterion causal_mse
with the same codes:
rforest2 = CausalRandomForestRegressor(criterion="causal_mse",
min_samples_leaf=200,
control_name=0,
n_estimators=50,
n_jobs=4)
rforest2.fit(X=df_train[feature_names].values,
treatment=df_train['treatment'].values,
y=df_train['outcome'].values
)
many of the predicted ITE values are inf
.
rf2_ite_pred = rforest2.predict(df_test[feature_names].values)
rf2_ite_pred[:100]
This is the case even if I change the nuisance to something simpler:
#b = (
# np.sin(np.pi * X[:, 0] * X[:, 1])
# + 2 * (X[:, 2] - 0.5) ** 2
# + X[:, 3]
# + 0.5 * X[:, 4]
#)
b = X[:, 3] + 2 * X[:, 4] + 3 * X[:, 1]
Expected behavior Should predict to valid values.
Environment (please complete the following information):
- OS: macOS
- Python Version: 3.9.16
- Versions of Major Dependencies:
pandas==1.5.2
,scikit-learn==1.0.2
Note: CausalRandomForestRegressor
with standard_mse
predicts fine on the same data.
More debug info: one of the trained tree seems to be bad:
print(rforest2.estimators_[10].feature_importances_)
=> [ 0. nan 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
After these trees are removed the predictions won't be inf
but still may predicts to extreme negative values (-3e+13
.)
Hi, thanks for the report. The issue has been fixed recently in https://github.com/uber/causalml/pull/583.
Please, reinstall the package from source.
You can also generate the desired type of synth data by changing mode
parameter:
y, X, w, tau, b, e = synthetic_data(mode=1, n=10000, p=20, sigma=5.0)
In causal_trees_with_synthetic_data.ipynb you will get the following result:
Thanks. Reinstalling from source fixes the problem!
This still happens with my real world data. Some predictions result in nan
(rather than inf
.) Maybe there's still issue?
Hi. Could you please plot each tree from your fitted CausalRandomForestRegressor
using plot_causal_tree
in causalml.inference.tree.plot
and attach images?
You can also attach small dataset which reproduces the nan
issue.