dowhy
dowhy copied to clipboard
Natural Direct Effect estimation does not support multiple mediators
Hi everybody! I'm quite new to the use of this library and, by testing it on a toy example, I think I found an error in the identification module.
In particular, I'm following the tutorial of natural direct effect (https://www.pywhy.org/dowhy/v0.9.1/example_notebooks/dowhy_mediation_analysis.html#) in the easy graph attached to estimate the effect of Z on Y only though the direct path Z->Y.
Therefore I use tha standard procedures with gcm (define the model, identification and estimation) but I see one problem with identification because I get returned E[d(Y|A1)/d(Z)] as the estimand, while I should have had E[d(Y|A1, A2)/d(Z)] because there are two paths to block to calculate that direct effect.
I've been told it's not a feature supported yet, so probably an issue should be raised to inform users about this.
Here's the code to reproduce it
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy
from dowhy import CausalModel
import dowhy.datasets, dowhy.plotter
import networkx as nx
# creating the toy dataset
n_samples = 10000
z = np.random.normal(0, 1, n_samples)
a1 = .5 * z + .2 * np.random.normal(0, 1, n_samples) +.3
a2 = .2 * z + .3 * np.random.normal(0, 1, n_samples) -.2
y = .7 * a1 + .6 * a2 -.4 * z + .2 * np.random.normal(0, 1, n_samples)
z = 1*(z>0)
a1 = 1*(a1>0)
a2 = 1*(a2>0)
y = 1*(y>0)
df = pd.DataFrame({'Z':z, 'A1':a1, 'A2':a2, 'Y':y})
# creating the gcm
s = "graph[directed 1"
for node in causal_graph.nodes:
s += "node[ id \"" + node + "\" label \"" + node + "\"]"
for edge in causal_graph.edges:
s += "edge[ source \"" + edge[0] + "\" target \"" + edge[1] + "\"]"
s += "]"
s
model = CausalModel(df,"Z","Y",s,
missing_nodes_as_confounders=False)
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
# finding the estimand
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde", proceed_when_unidentifiable=False, optimize_backdoor = True)
print(identified_estimand_nde)
Output
Estimand type: EstimandType.NONPARAMETRIC_NDE
### Estimand : 1
Estimand name: mediation
Estimand expression:
⎡ d ⎤
E⎢────(Y|A1)⎥
⎣d[Z] ⎦
Estimand assumption 1, Mediation: A1 intercepts (blocks) all directed paths from Z to Y except the path {Z}→{Y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{Z} and U→{A1} then P(A1|Z,U) = P(A1|Z)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{A1} and U→Y then P(Y|A1, Z, U) = P(Y|A1, Z)
Thanks for raising this @DarioSimonato There is an issue with supporting multi-variable mediators. Will look into this and raise an Error if needed.
I am running into the same issue. Would very much appreciate an update. Thank you in advance!
Any update on this?