causalml
causalml copied to clipboard
Meta-Learner Reproducibility
Description On first attempt, meta-learners perform great: lift and gain curves look reasonable and I can identify groups with stark heterogeneous effects. Whenever the notebook is run again, cannot to replicate the results even by trying several random_state and seed combinations. If these heterogeneous effects groups do exist in the data, why is it so hard to identify them again?
Code
X=df_reg[features_fullrank] y=df_reg[target_var] treatment=df_reg[treatment]
X_train, X_test, y_train, y_test, treatment_train, treatment_test = train_test_split(X, y, treatment, test_size=0.5, random_state=42)
learner_x = BaseRRegressor(learner=XGBRegressor(random_state=10))
learner_x.fit(X=X_train, treatment=treatment_train, y=y_train)
Screenshots First Attempt
data:image/s3,"s3://crabby-images/57d31/57d31c132ce52a847a65d5782bb2748370ce9eb6" alt="Screen Shot 2022-07-14 at 8 55 57 AM"
data:image/s3,"s3://crabby-images/29d43/29d43a6c6bd2fca280a6d61a781e245a1c999412" alt="Screen Shot 2022-07-14 at 8 56 02 AM"
data:image/s3,"s3://crabby-images/a4141/a4141a0c09e977c072c3e2047e0c97497be80b8a" alt="Screen Shot 2022-07-14 at 8 56 19 AM"
Any further attempt
Environment:
- OS: Linux-4.14.174-x86_64-with-debian-8.11
- Python Version: 3.6
-
pandas==0.25.3
,scikit-learn==0.22
,causalml==0.9.0
,xgboost=0.81
Additional context
I did learn that reproducibility can be ensured by setting np.random.seed(seed)
, still I am unable to replicate my initial results.
Hi @imaginarymuffin R-learner itself uses cross-validaiton as well as described here. I saw you already fix the random_state in BaseRRegressor(learner=XGBRegressor(random_state=10))
, but R-learner also need the similar input (see code here) which said you could try BaseRRegressor(learner=XGBRegressor(random_state=10), random_state=10)
.