dowhy icon indicating copy to clipboard operation
dowhy copied to clipboard

backdoor.distance_matching: Problem with exact matching

Open Avichaicha opened this issue 2 years ago • 1 comments

I think there is a bug with the matching function.

When trying to insert an argument of exact matching - there are controls who stick to the cases even though there is no similarity in the 'exact' variables.

It can be seen that there is not a full balance in the sample created after the match, and that in many cases there are also controls from the other groups.

  • Code:
from dowhy import CausalModel
import dowhy.datasets
import numpy as np
import pandas as pd

data = dowhy.datasets.linear_dataset(
    beta=10,
    num_common_causes=5,
    num_instruments=0,
    num_samples=10000,
    treatment_is_binary=True)

#### change W3 to 1/0 for exact matching: 
data["df"] = data["df"].assign(W3 = lambda x: np.where(x.W3>0,1,0))

model = CausalModel(
    data=data["df"],
    treatment=data["treatment_name"],
    outcome=data["outcome_name"],
    graph=data["gml_graph"])

model.view_model()

identified_estimand = model.identify_effect()

estimate = model.estimate_effect(identified_estimand,
                                 target_units = 'att',
                                 method_name="backdoor.distance_matching",
                                 method_params={
                                 'exact_match_cols':['W3'],
                                 'num_matches_per_unit':2
                                 })

print(estimate)

df_m=[]
for i in range(df.shape[0]):
    if data["df"].iloc[i].v0==True:
        df_m.append( data["df"].iloc[[i]+ estimate.estimator.matched_indices_att[i] ] )
df_matched = pd.concat(df_m)

#####%% Compare covariate balancing before and after matching: 
balance = (pd.concat([
    df_matched.groupby('v0').mean().T.drop(True,axis =1 ),
    data["df"].groupby('v0').mean().T],axis = 1 )
    #.reset_index()
    .query("index!='y'")
           )

balance.columns = ['matched controls','treatment','unmatched controls']
balance.plot()
plt.show()


res =[]
for i in range(df.shape[0]):
    if data["df"].iloc[i].v0==True:
        res.append(data["df"].iloc[[i]+ estimate.estimator.matched_indices_att[i] ].W3.value_counts().shape[0])

#### if > 1 then we have more than 1 W3 values inside each 'case' group of matches: 
print(np.array(res).mean())
1.176697874869966 

image

  • DoWhy 0.9.1

Avichaicha avatar Jan 08 '23 12:01 Avichaicha

I opened a pull request with correction: https://github.com/py-why/dowhy/pull/819

Avichaicha avatar Jan 11 '23 09:01 Avichaicha