ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

skip-save-model leads to different metrics

Open MarselScheer opened this issue 1 year ago • 2 comments

Describe the bug Adding the flag --skip_save_model for training gives different metrics at the end

To Reproduce Steps to reproduce the behavior:

model_type: gbm
input_features:
    - name: age
      type: number
output_features:
    - name: income
      type: binary
docker run --gpus all -i -t --rm ludwigai/ludwig-ray-gpu:0.8.4 /bin/bash
ludwig train --config ludwig.yaml --dataset ludwig://adult_census_income --random_seed 1

yields

╒═════════════════════════════════╤════════════════════╕
│ Validation feature              │ income             │
├─────────────────────────────────┼────────────────────┤
│ Validation metric               │ roc_auc            │
├─────────────────────────────────┼────────────────────┤
│ Best model step                 │ 50                 │
├─────────────────────────────────┼────────────────────┤
│ Best model epoch                │ 1                  │
├─────────────────────────────────┼────────────────────┤
│ Best model's validation roc_auc │ 0.7122949361801147 │
├─────────────────────────────────┼────────────────────┤
│ Best model's test roc_auc       │ 0.7052067518234253 │
╘═════════════════════════════════╧════════════════════╛

and

ludwig train --config ludwig.yaml --dataset ludwig://adult_census_income --random_seed 1 --skip_save_model

yields

╒═════════════════════════════════╤════════════════════╕
│ Validation feature              │ income             │
├─────────────────────────────────┼────────────────────┤
│ Validation metric               │ roc_auc            │
├─────────────────────────────────┼────────────────────┤
│ Best model step                 │ 250                │
├─────────────────────────────────┼────────────────────┤
│ Best model epoch                │ 5                  │
├─────────────────────────────────┼────────────────────┤
│ Best model's validation roc_auc │ 0.7126029133796692 │
├─────────────────────────────────┼────────────────────┤
│ Best model's test roc_auc       │ 0.7061160802841187 │
╘═════════════════════════════════╧════════════════════╛

Expected behavior AUC should not change

Environment (please complete the following information):

  • ludwigai/ludwig-ray-gpu:0.8.4

MarselScheer avatar Oct 18 '23 18:10 MarselScheer

Hi @MarselScheer, thank you for raising this! I'll see about tracking this down.

jeffkinnison avatar Oct 20 '23 21:10 jeffkinnison

Hi @MarselScheer, I was able to repro this. LightGBM GPU training seems to be unstable regardless of whether skip_save_model is used. I'm looking into whether pinning an earlier version of lightgbm will resolve this issue.

jeffkinnison avatar Nov 27 '23 18:11 jeffkinnison