EconML Error of crossfit folds splits with DynamicDML

Hi,

I am estimating the effect of high levels of particulate matter (PM2.5) on excess deaths from panel data for 25 municipalities with daily resolution. It means my treatment is a binary variable where T=1, when the level of PM2.5 is high, and T=0, when the level of PM2.5 is low. The outcome is also a binary variable, where Y=0 for non-excess deaths, and Y=1 for excess deaths.

I am using the class DynamicDML to fit my model, but I get this error message: "AttributeError: Provided crossfit folds contain training splits that don't contain all treatments". But, 50% of the data corresponds to observations with T=1, I think it is enough to obtain balanced crossfit folds.

Here is my code with econml version 0.15 and dowhy version 0.10.1 dataset_pm_deaths.csv

` import dowhy import econml from dowhy import CausalModel import pandas as pd import numpy as np from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LassoCV import scipy.stats as stats from itertools import product from econml.utilities import WeightedModelWrapper from sklearn.model_selection import train_test_split from econml.panel.dml import DynamicDML

data_all = pd.read_csv("D:/dataset_pm_deaths.csv") data = data_all[data_all['Year'] >= 2009]

median_pm25 = data['PM25'].median() data['PM25'] = (data['PM25'] >= median_pm25).astype(int)

data.BC = stats.zscore(data.BC, nan_policy='omit') data.DMS = stats.zscore(data.DMS, nan_policy='omit') data.PM = stats.zscore(data.PM, nan_policy='omit') data.OC = stats.zscore(data.OC, nan_policy='omit') data.SO2 = stats.zscore(data.SO2, nan_policy='omit') data.SO4 = stats.zscore(data.SO4, nan_policy='omit')

data0 = data[['excess', 'PM25', 'cod_munici', 'BC', 'DMS', 'PM', 'OC', 'SO2', 'SO4', 'Temperature', 'lead1_PM25']] data0 = data0.dropna() Y = data0.excess.to_numpy() T = data0.PM25.to_numpy() percentage_high_PM25 = np.mean(T == 1) * 100 W = data0[['BC', 'DMS', 'PM', 'OC', 'SO2', 'SO4', 'Temperature']].to_numpy().reshape(-1, 7) X = data0[['Temperature', 'lead1_PM25']].to_numpy().reshape(-1, 2) groups = data0.cod_munici.to_numpy()

estimate0 = DynamicDML(discrete_treatment=True, featurizer=PolynomialFeatures(degree=3), linear_first_stages=False, cv=3, random_state=123) estimate0.fit(Y=Y, T=T, X=X, W=W, inference='auto', groups=groups) # HERE IS THE ERROR `

Jul 16 '24 17:07 juandavidgutier

Have you tried passing a StratifiedKFold-object or creating your own cv-splitter? That could help you out in the meantime

Aug 20 '24 08:08 TimCosemans

Hi @TimCosemans

Thanks for your suggestions!

Aug 21 '24 12:08 juandavidgutier

EconML EconML copied to clipboard

Error of crossfit folds splits with DynamicDML

EconML
EconML copied to clipboard