dowhy
dowhy copied to clipboard
backdoor.distance_matching: Problem with exact matching
I think there is a bug with the matching function.
When trying to insert an argument of exact matching - there are controls who stick to the cases even though there is no similarity in the 'exact' variables.
It can be seen that there is not a full balance in the sample created after the match, and that in many cases there are also controls from the other groups.
- Code:
from dowhy import CausalModel
import dowhy.datasets
import numpy as np
import pandas as pd
data = dowhy.datasets.linear_dataset(
beta=10,
num_common_causes=5,
num_instruments=0,
num_samples=10000,
treatment_is_binary=True)
#### change W3 to 1/0 for exact matching:
data["df"] = data["df"].assign(W3 = lambda x: np.where(x.W3>0,1,0))
model = CausalModel(
data=data["df"],
treatment=data["treatment_name"],
outcome=data["outcome_name"],
graph=data["gml_graph"])
model.view_model()
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand,
target_units = 'att',
method_name="backdoor.distance_matching",
method_params={
'exact_match_cols':['W3'],
'num_matches_per_unit':2
})
print(estimate)
df_m=[]
for i in range(df.shape[0]):
if data["df"].iloc[i].v0==True:
df_m.append( data["df"].iloc[[i]+ estimate.estimator.matched_indices_att[i] ] )
df_matched = pd.concat(df_m)
#####%% Compare covariate balancing before and after matching:
balance = (pd.concat([
df_matched.groupby('v0').mean().T.drop(True,axis =1 ),
data["df"].groupby('v0').mean().T],axis = 1 )
#.reset_index()
.query("index!='y'")
)
balance.columns = ['matched controls','treatment','unmatched controls']
balance.plot()
plt.show()
res =[]
for i in range(df.shape[0]):
if data["df"].iloc[i].v0==True:
res.append(data["df"].iloc[[i]+ estimate.estimator.matched_indices_att[i] ].W3.value_counts().shape[0])
#### if > 1 then we have more than 1 W3 values inside each 'case' group of matches:
print(np.array(res).mean())
1.176697874869966

- DoWhy 0.9.1
I opened a pull request with correction: https://github.com/py-why/dowhy/pull/819