EconML icon indicating copy to clipboard operation
EconML copied to clipboard

Interpreting treatment interactions with DoubleML

Open chelsealee14 opened this issue 11 months ago • 0 comments

Hi,

This is related to a previous issue. Would like to know how to use DoubleML for >1 treatments.

Suppose we have two separate treatments using two different DoubleML models with the same set of confounding, W, and same variable to compute CATEs on, X . Both W and X occur prior to treatment. The treatment and outcomes are (for the sake of an example using data from Multi-investment Attribution Software Company ):

  • Treatment A (T1): Major Flag (y/n)
  • Treatment B (T2): Tech Support (y/n)
  • Outcome (Y): Revenue ($)

The data prep (mostly from notebook):

from econml.dml import LinearDML

file_url = "https://msalicedatapublic.z5.web.core.windows.net/datasets/ROI/multi_attribution_sample.csv"
multi_data = pd.read_csv(file_url)

# Define estimator inputs
T1 = multi_data["Major Flag"]
T2 = multi_data["Tech Support"]
Y = multi_data["Revenue"]  # amount of product purchased, or outcome
X = multi_data[["Size"]]  # heterogeneity feature
W = multi_data.drop(
    columns=["Tech Support", "Major Flag", "Revenue", "Size"]
)  # controls

The individual models and their ATE are shown: For Major flag (T1)

model = LinearDML(discrete_treatment=True)

# Specify final stage inference type and fit model
model.fit(Y=Y, T=T1, X=X, W=W)
print(f" ATE for major flag is: {model.ate(X)} with CI {model.ate_interval(X)}")

ATE for major flag is: 2364.232844526994 with CI (1930.2699916809565, 2798.1956973730316)

For tech support (T2)

model = LinearDML(discrete_treatment=True)

# Specify final stage inference type and fit model
model.fit(Y=Y, T=T2, X=X, W=W)
print(f" ATE for tech support is: {model.ate(X)} with CI {model.ate_interval(X)}")

ATE for tech support is: 7156.214315710862 with CI (6952.461324721255, 7359.96730670047)

Questions:

Let's say that the ATE for receiving tech support (T2), when modeled separately, seems stronger than we'd expect and hypothesize it's been overestimated since its true impact might be dependent on the company being a major corporation (T1).

  1. Can DoubleML model the sequential effect of T1 and T2 to test our hypothesis that Major Flag precedes Tech Support, assuming we have timestamps that T1 is before T2 or not ? How would the input look like?
  2. I noticed that DoubleML can take multiple treatment, but it looks like this setup is for treatments that occur concurrently. Is this true? The concat method also doesn't seem to have the right dimensions "See Single Outcome, Multiple Treatments"
est = LinearDML()
est.fit(y, np.concatenate((T0, T1), axis=1), X=X, W=W)
  1. How would the interpretation of the ATE change if DoubleML can do multiple treatments? For example, instead of 'Having tech support increases product revenue, on average, by $7,156', what would it be? Wondering if there are other nuances to understand too.
  2. We actually do not know if T1 and T2 should be modeled separately or together to test the question in 1). How do we know which hypothesis is 'correct'?

Thank you!

chelsealee14 avatar Jan 07 '25 01:01 chelsealee14