causalml icon indicating copy to clipboard operation
causalml copied to clipboard

Meta-Learner Reproducibility

Open imaginarymuffin opened this issue 2 years ago • 1 comments

Description On first attempt, meta-learners perform great: lift and gain curves look reasonable and I can identify groups with stark heterogeneous effects. Whenever the notebook is run again, cannot to replicate the results even by trying several random_state and seed combinations. If these heterogeneous effects groups do exist in the data, why is it so hard to identify them again?

Code

X=df_reg[features_fullrank] y=df_reg[target_var] treatment=df_reg[treatment]

X_train, X_test, y_train, y_test, treatment_train, treatment_test = train_test_split(X, y, treatment, test_size=0.5, random_state=42)

learner_x = BaseRRegressor(learner=XGBRegressor(random_state=10))

learner_x.fit(X=X_train, treatment=treatment_train, y=y_train)

Screenshots First Attempt

Screen Shot 2022-07-14 at 8 55 57 AM Screen Shot 2022-07-14 at 8 56 02 AM Screen Shot 2022-07-14 at 8 56 19 AM

Any further attempt Screen Shot 2022-07-14 at 8 57 39 AM Screen Shot 2022-07-14 at 8 57 44 AM Screen Shot 2022-07-14 at 8 58 00 AM

Environment:

  • OS: Linux-4.14.174-x86_64-with-debian-8.11
  • Python Version: 3.6
  • pandas==0.25.3, scikit-learn==0.22, causalml==0.9.0, xgboost=0.81

Additional context I did learn that reproducibility can be ensured by setting np.random.seed(seed), still I am unable to replicate my initial results.

imaginarymuffin avatar Jul 14 '22 18:07 imaginarymuffin

Hi @imaginarymuffin R-learner itself uses cross-validaiton as well as described here. I saw you already fix the random_state in BaseRRegressor(learner=XGBRegressor(random_state=10)), but R-learner also need the similar input (see code here) which said you could try BaseRRegressor(learner=XGBRegressor(random_state=10), random_state=10).

ppstacy avatar Jul 15 '22 04:07 ppstacy