interpret
interpret copied to clipboard
Verbose option during training
Training time can be quite long, particular with inner bagging is used. It would be helpful to have a verbose option to understand where the EBM model is while in training and how the model is performing on the validation set.
Hi @onacrame,
This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.
The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:
- A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.
- It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.
- It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.
We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:
- Just report the number of
outer_bags
completed, without realtime validation metrics. - Report
outer_bags
completed, with real time validation from one randomly selected bag. - Show a progress bar per
outer_bag
from 1 tomax_rounds
, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming ifouter_bags
is increased significantly.
Thanks again for the great question! -InterpretML Team
Thanks for the detailed response. Completely get the limitations/practicalities now.
I think a verbose setting such as verbose=0/None
gives no output and verbose=100
prints every 100th round would be sufficient. This the general behaviour of the "Big Three" GBM models.
Could this not be done by storing the scores for each mini-model in memory at each 100th round (in the case verbose=100
), then, when the final model of the mini-models reaches this round, the average is computed and then printed?
I understand this could very slightly increase training time, but I think it is really important for user experience, debugging/finding frozen models and, probably most importantly, understand how the model is training and therefore having a better understanding of how the models works and can be interpreted by the user. For me it strikes at the core of this package!
Finally, the output for training interpretability in CatBoost is as follows, and can be extremely insightful:
from catboost import CatBoostClassifier, Pool
train_data = [[1, 3], [0, 4], [1, 7], [0, 3]]
train_labels = [1, 0, 1, 1]
eval_data = [[1, 4], [0, 4.2], [1, 7], [0, 3]]
eval_labels = [1, 0, 1, 1]
model = CatBoostClassifier(learning_rate=0.03)
model.fit(train_data,
train_labels,
verbose=100,
eval_set=(eval_data, eval_labels),
plot=True)
Just some food for though :D
Hi @onacrame,
This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.
The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:
- A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.
- It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.
- It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.
We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:
- Just report the number of
outer_bags
completed, without realtime validation metrics.- Report
outer_bags
completed, with real time validation from one randomly selected bag.- Show a progress bar per
outer_bag
from 1 tomax_rounds
, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming ifouter_bags
is increased significantly.Thanks again for the great question! -InterpretML Team
Option 2) would be very helpful. If I could even estimate the remaining time based on rounds and tendency to convergence/early stopping, it would help tremendously.
Note that if the randomly chosen bag finished before the rest, reporting the next bag automatically would be a nice addition.
EBMs generally take a fixed and consistent amount of time per round of boosting. You should be able to get a pretty good EBM after somewhere in the range of 1000-2000 rounds of boosting. Our default max is set to 5000, but usually it early stops before reaching that high.
I think if you did an initial run with 20 rounds, most of the time will have been spent boosting instead of as startup time. So, to get an overall estimate multiply that time by 100 and it should be in the ballpark for 2000 rounds.
I agree some feedback on time remaining would be better, but for now this is the best approach I can offer.