dowhy icon indicating copy to clipboard operation
dowhy copied to clipboard

Problem interpreting 95.0% confidence interval in backdoor.linear_regression

Open juandavidgutier opened this issue 4 years ago • 6 comments

Hello,

I am trying to estimate the effect of El Niño on incidence of leishmaniasis. I used the method "backdoor.linear_regression" with test_significance=True and confidence_intervals=True. However, when I see the value of the confidence interval [[1.02988048 2.0855936 ]], the interval does not contain the mean value of the estimate (2.8158204337251664). I am confuse about it because I hoped that the confidence interval should include the mean value of the estimate.

Can anyone help me to understand what is happening?

I appreciate the cooperation

Here my dataset data.csv

And here my code

import os, warnings, random import dowhy import econml from dowhy import CausalModel import pandas as pd import numpy as np

#####El Nino vs Neutral data_nino = pd.read_csv("data") data_nino = data_nino.dropna()

data_leish_nino = data_nino.drop(['Codigo.DANE.periodo','Codigo.DANE', 'consensoENSO'], axis=1) data_leish_nino.head() data_leish_nino = data_leish_nino.astype({"TF_consenso":'bool'}, copy=False)

###colombia colombia_nino = data_leish_nino

#Step 1: Modeling the causal mechanism model_leish=CausalModel( data = colombia_nino, treatment=['TF_consenso'], outcome='incidencia100k', common_causes=['SST3.4'], effect_modifiers=['bosques'], frontdoor=['Temperature', 'Rainfall'], graph= "digraph {SST3.4->TF_consenso;SST3.4->incidencia100k;SST3.4->Temperature;SST3.4->Rainfall;TF_consenso->Temperature;TF_consenso->Rainfall;TF_consenso->incidencia100k;Temperature->incidencia100k;Rainfall->incidencia100k;bosques->incidencia100k;}" )

#view model model_leish.view_model()

#Step 2: Identifying effects identified_estimand = model_leish.identify_effect(proceed_when_unidentifiable=True) print(identified_estimand)

#Step 3: Estimation of the effect ####ate, significance and confidence interval estimate_bd = model_leish.estimate_effect(identified_estimand, method_name="backdoor.linear_regression", test_significance=True, confidence_intervals=True)

print(estimate_bd)

juandavidgutier avatar Oct 20 '21 16:10 juandavidgutier

This is odd. I can try to look at this, but it may take some time.

amit-sharma avatar Oct 21 '21 12:10 amit-sharma

@amit-sharma Thanks for the cooperation

juandavidgutier avatar Oct 21 '21 14:10 juandavidgutier

I just had a similar thing with my own data. If you use the get_confidence_intervals method of the CausalEstimate class with argument method="bootstrap", that might return more sensible values. It did for me.

jmafoster1 avatar Oct 25 '21 13:10 jmafoster1

@jmafoster1 Great!!! Thanks for the tip.

juandavidgutier avatar Oct 25 '21 15:10 juandavidgutier

@jmafoster1 I followed your advice but in a new dataset I found the same problem related with that the interval (0.1192 - 0.2268) does not contain the mean value of the estimate (9.689e-17). I don't know if the difficulty can be generated by the small mean value?

I am using this line of code to estimate the CI:

dml_estimate_soiltemp = model_leish.estimate_effect(identified_estimand_soiltemp, target_units = "ate", #test_significance=True, #confidence_intervals=True, method_name="backdoor.econml.dml.DML", method_params={ 'init_params': {'model_y':GradientBoostingRegressor(), 'model_t': GradientBoostingRegressor(), 'featurizer':PolynomialFeatures(degree=1, include_bias=True), 'model_final':LassoCV(fit_intercept=False), 'random_state':123}, 'fit_params': {'inference': BootstrapInference(n_bootstrap_samples=25, n_jobs=-1), } }) ##confidence interval with boostrap soiltemp ci_Colombia_boost_soiltemp = dml_estimate_soiltemp.get_confidence_intervals(method="bootstrap", confidence_level=0.95, num_simulations=10, sample_size_fraction=0.7) print(ci_Colombia_boost_soiltemp)

juandavidgutier avatar Nov 08 '21 12:11 juandavidgutier

I'm afraid I don't know how the confidence intervals code works, but it looks like you're using EconML as your estimator. I think they have their own methods to calculate confidence intervals. See https://microsoft.github.io/dowhy/example_notebooks/dowhy-conditional-treatment-effects.html#CATE-Object-and-Confidence-Intervals for details.

jmafoster1 avatar Nov 09 '21 08:11 jmafoster1