LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

Cross validation early stopping

Open segatrade opened this issue 2 years ago • 1 comments

Now cross validation early stopping happen based on mean. But seems it's more correct to use minimum (worst) from all folds in iteration, if we want to choose num_iterations based on best_iteration for train model on complete dataset after cv.

https://sites.google.com/site/lauraeppx/xgboost/cross-validation also seems @Laurae2 tell here about it

For example, if on 3 folds cv we got accuracy on iteration 35) 0.9, 0.9, 0, mean = 0.6 29) 0.59, 0.58, 0.57, mean 0.58 - seems iteration 29 is better to choose for num_iterations train model on complete set, but mean on 35 is better.

Is any way to change lgbm.cv from mean to min mode? Or only my own cv with usual lgbm.train calls? Also if make my own - does lgbm.cv have performance benefits than call several time lgbm.train that I can use? It load data 1 time or several?

segatrade avatar Jan 25 '23 07:01 segatrade

I don't think so. It's possible that you have a fold whose error is monotonically decreasing but still higher than other folds whereas other folds do have their minimums in early rounds . Then choosing the worst error will always set the best iter to the total number of iters.

3zhang avatar May 11 '23 08:05 3zhang