ludwig
ludwig copied to clipboard
skip-save-model leads to different metrics
Describe the bug Adding the flag --skip_save_model for training gives different metrics at the end
To Reproduce Steps to reproduce the behavior:
model_type: gbm
input_features:
- name: age
type: number
output_features:
- name: income
type: binary
docker run --gpus all -i -t --rm ludwigai/ludwig-ray-gpu:0.8.4 /bin/bash
ludwig train --config ludwig.yaml --dataset ludwig://adult_census_income --random_seed 1
yields
╒═════════════════════════════════╤════════════════════╕
│ Validation feature │ income │
├─────────────────────────────────┼────────────────────┤
│ Validation metric │ roc_auc │
├─────────────────────────────────┼────────────────────┤
│ Best model step │ 50 │
├─────────────────────────────────┼────────────────────┤
│ Best model epoch │ 1 │
├─────────────────────────────────┼────────────────────┤
│ Best model's validation roc_auc │ 0.7122949361801147 │
├─────────────────────────────────┼────────────────────┤
│ Best model's test roc_auc │ 0.7052067518234253 │
╘═════════════════════════════════╧════════════════════╛
and
ludwig train --config ludwig.yaml --dataset ludwig://adult_census_income --random_seed 1 --skip_save_model
yields
╒═════════════════════════════════╤════════════════════╕
│ Validation feature │ income │
├─────────────────────────────────┼────────────────────┤
│ Validation metric │ roc_auc │
├─────────────────────────────────┼────────────────────┤
│ Best model step │ 250 │
├─────────────────────────────────┼────────────────────┤
│ Best model epoch │ 5 │
├─────────────────────────────────┼────────────────────┤
│ Best model's validation roc_auc │ 0.7126029133796692 │
├─────────────────────────────────┼────────────────────┤
│ Best model's test roc_auc │ 0.7061160802841187 │
╘═════════════════════════════════╧════════════════════╛
Expected behavior AUC should not change
Environment (please complete the following information):
- ludwigai/ludwig-ray-gpu:0.8.4
Hi @MarselScheer, thank you for raising this! I'll see about tracking this down.
Hi @MarselScheer, I was able to repro this. LightGBM GPU training seems to be unstable regardless of whether skip_save_model
is used. I'm looking into whether pinning an earlier version of lightgbm will resolve this issue.