causalml
causalml copied to clipboard
Meta-Learner Reproducibility
Description On first attempt, meta-learners perform great: lift and gain curves look reasonable and I can identify groups with stark heterogeneous effects. Whenever the notebook is run again, cannot to replicate the results even by trying several random_state and seed combinations. If these heterogeneous effects groups do exist in the data, why is it so hard to identify them again?
Code
X=df_reg[features_fullrank] y=df_reg[target_var] treatment=df_reg[treatment]
X_train, X_test, y_train, y_test, treatment_train, treatment_test = train_test_split(X, y, treatment, test_size=0.5, random_state=42)
learner_x = BaseRRegressor(learner=XGBRegressor(random_state=10))
learner_x.fit(X=X_train, treatment=treatment_train, y=y_train)
Screenshots First Attempt



Any further attempt
Environment:
- OS: Linux-4.14.174-x86_64-with-debian-8.11
- Python Version: 3.6
-
pandas==0.25.3
,scikit-learn==0.22
,causalml==0.9.0
,xgboost=0.81
Additional context
I did learn that reproducibility can be ensured by setting np.random.seed(seed)
, still I am unable to replicate my initial results.
Hi @imaginarymuffin R-learner itself uses cross-validaiton as well as described here. I saw you already fix the random_state in BaseRRegressor(learner=XGBRegressor(random_state=10))
, but R-learner also need the similar input (see code here) which said you could try BaseRRegressor(learner=XGBRegressor(random_state=10), random_state=10)
.