interpret Verbose option during training

Training time can be quite long, particular with inner bagging is used. It would be helpful to have a verbose option to understand where the EBM model is while in training and how the model is performing on the validation set.

Jun 11 '21 13:06 onacrame

Hi @onacrame,

This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.

The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:

A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.
It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.
It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.

We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:

Just report the number of outer_bags completed, without realtime validation metrics.
Report outer_bags completed, with real time validation from one randomly selected bag.
Show a progress bar per outer_bag from 1 to max_rounds, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming if outer_bags is increased significantly.

Thanks again for the great question! -InterpretML Team

Jun 14 '21 13:06 interpret-ml

Thanks for the detailed response. Completely get the limitations/practicalities now.

Jun 15 '21 21:06 onacrame

I think a verbose setting such as verbose=0/None gives no output and verbose=100 prints every 100th round would be sufficient. This the general behaviour of the "Big Three" GBM models.

Could this not be done by storing the scores for each mini-model in memory at each 100th round (in the case verbose=100), then, when the final model of the mini-models reaches this round, the average is computed and then printed?

I understand this could very slightly increase training time, but I think it is really important for user experience, debugging/finding frozen models and, probably most importantly, understand how the model is training and therefore having a better understanding of how the models works and can be interpreted by the user. For me it strikes at the core of this package!

Finally, the output for training interpretability in CatBoost is as follows, and can be extremely insightful:

from catboost import CatBoostClassifier, Pool

train_data = [[1, 3], [0, 4], [1, 7], [0, 3]]
train_labels = [1, 0, 1, 1]

eval_data = [[1, 4], [0, 4.2], [1, 7], [0, 3]]
eval_labels = [1, 0, 1, 1]

model = CatBoostClassifier(learning_rate=0.03)

model.fit(train_data,
          train_labels,
          verbose=100,
          eval_set=(eval_data, eval_labels),
          plot=True)

Just some food for though :D

Jun 16 '21 10:06 JoshuaC3

Hi @onacrame,

This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.

The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:

A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.

It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.

It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.

We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:

Just report the number of outer_bags completed, without realtime validation metrics.

Report outer_bags completed, with real time validation from one randomly selected bag.

Show a progress bar per outer_bag from 1 to max_rounds, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming if outer_bags is increased significantly.

Thanks again for the great question! -InterpretML Team

Option 2) would be very helpful. If I could even estimate the remaining time based on rounds and tendency to convergence/early stopping, it would help tremendously.

Note that if the randomly chosen bag finished before the rest, reporting the next bag automatically would be a nice addition.

Sep 19 '22 15:09 bverhoeff

EBMs generally take a fixed and consistent amount of time per round of boosting. You should be able to get a pretty good EBM after somewhere in the range of 1000-2000 rounds of boosting. Our default max is set to 5000, but usually it early stops before reaching that high.

I think if you did an initial run with 20 rounds, most of the time will have been spent boosting instead of as startup time. So, to get an overall estimate multiply that time by 100 and it should be in the ballpark for 2000 rounds.

I agree some feedback on time remaining would be better, but for now this is the best approach I can offer.

Sep 20 '22 09:09 paulbkoch

interpret interpret copied to clipboard

Verbose option during training

interpret
interpret copied to clipboard