dowhy
dowhy copied to clipboard
Problem interpreting 95.0% confidence interval in backdoor.linear_regression
Hello,
I am trying to estimate the effect of El Niño on incidence of leishmaniasis. I used the method "backdoor.linear_regression" with test_significance=True and confidence_intervals=True. However, when I see the value of the confidence interval [[1.02988048 2.0855936 ]], the interval does not contain the mean value of the estimate (2.8158204337251664). I am confuse about it because I hoped that the confidence interval should include the mean value of the estimate.
Can anyone help me to understand what is happening?
I appreciate the cooperation
Here my dataset data.csv
And here my code
import os, warnings, random import dowhy import econml from dowhy import CausalModel import pandas as pd import numpy as np
#####El Nino vs Neutral data_nino = pd.read_csv("data") data_nino = data_nino.dropna()
data_leish_nino = data_nino.drop(['Codigo.DANE.periodo','Codigo.DANE', 'consensoENSO'], axis=1) data_leish_nino.head() data_leish_nino = data_leish_nino.astype({"TF_consenso":'bool'}, copy=False)
###colombia colombia_nino = data_leish_nino
#Step 1: Modeling the causal mechanism model_leish=CausalModel( data = colombia_nino, treatment=['TF_consenso'], outcome='incidencia100k', common_causes=['SST3.4'], effect_modifiers=['bosques'], frontdoor=['Temperature', 'Rainfall'], graph= "digraph {SST3.4->TF_consenso;SST3.4->incidencia100k;SST3.4->Temperature;SST3.4->Rainfall;TF_consenso->Temperature;TF_consenso->Rainfall;TF_consenso->incidencia100k;Temperature->incidencia100k;Rainfall->incidencia100k;bosques->incidencia100k;}" )
#view model model_leish.view_model()
#Step 2: Identifying effects identified_estimand = model_leish.identify_effect(proceed_when_unidentifiable=True) print(identified_estimand)
#Step 3: Estimation of the effect ####ate, significance and confidence interval estimate_bd = model_leish.estimate_effect(identified_estimand, method_name="backdoor.linear_regression", test_significance=True, confidence_intervals=True)
print(estimate_bd)
This is odd. I can try to look at this, but it may take some time.
@amit-sharma Thanks for the cooperation
I just had a similar thing with my own data. If you use the get_confidence_intervals method of the CausalEstimate class with argument method="bootstrap", that might return more sensible values. It did for me.
@jmafoster1 Great!!! Thanks for the tip.
@jmafoster1 I followed your advice but in a new dataset I found the same problem related with that the interval (0.1192 - 0.2268) does not contain the mean value of the estimate (9.689e-17). I don't know if the difficulty can be generated by the small mean value?
I am using this line of code to estimate the CI:
dml_estimate_soiltemp = model_leish.estimate_effect(identified_estimand_soiltemp, target_units = "ate", #test_significance=True, #confidence_intervals=True, method_name="backdoor.econml.dml.DML", method_params={ 'init_params': {'model_y':GradientBoostingRegressor(), 'model_t': GradientBoostingRegressor(), 'featurizer':PolynomialFeatures(degree=1, include_bias=True), 'model_final':LassoCV(fit_intercept=False), 'random_state':123}, 'fit_params': {'inference': BootstrapInference(n_bootstrap_samples=25, n_jobs=-1), } }) ##confidence interval with boostrap soiltemp ci_Colombia_boost_soiltemp = dml_estimate_soiltemp.get_confidence_intervals(method="bootstrap", confidence_level=0.95, num_simulations=10, sample_size_fraction=0.7) print(ci_Colombia_boost_soiltemp)
I'm afraid I don't know how the confidence intervals code works, but it looks like you're using EconML as your estimator. I think they have their own methods to calculate confidence intervals. See https://microsoft.github.io/dowhy/example_notebooks/dowhy-conditional-treatment-effects.html#CATE-Object-and-Confidence-Intervals for details.