LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

Logging metrics and time information

Open nd7141 opened this issue 4 years ago • 8 comments

Summary

When I use CLI distributed training I would like to write/log metrics and time information during training into file.

What is the current method to see how my training loss evolves?

Motivation

This is a crucial part for debugging ML models: being able to see how training vs test loss behaves allows to capture overfitting. Saving properly time information is also crucial to compare different frameworks as well as the trade-off between quality and speed. Besides time can be broken down into the time of preprocessing and the time to build a tree, which is important to profile the timing of a model.

The only way I found now is to parse through the stdout messages. Did I miss some other ways to log/save metrics and time info?

References

CatBoost example: https://catboost.ai/en/docs/concepts/output-data_training-log

nd7141 avatar Oct 18 '21 13:10 nd7141

@nd7141 Thanks for using LightGBM. As far as I know, in CLI version there's no way to store the intermediate results in all iterations into a structured data format. Gently ping @StrikerRUS to confirm. But I agree that storing intermediate results into something like a json file would be very useful.

shiyu1994 avatar Oct 20 '21 07:10 shiyu1994

Thanks @shiyu1994. How hard it would be to implement logging into a file?

Also I can see that it's possible to log metrics inside evals_result dict (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html) but is it possible to log time and particularly breakdown between building a tree and preprocessing the data?

nd7141 avatar Oct 20 '21 19:10 nd7141

@nd7141 We are focusing on several large pull request recently, and maybe we can schedule the implementation of saving results into file for CLI version on the next month. Contribution is quite welcome.

Yes, with python we can use evals_result, but it does not support recording of time natively. A simple work around would be write a customized evaluation function like this

start_time = time.time()
def feval_time(preds, data):
    return 'time', time.time() - start_time, True
```.
And specify `feval=feval_time` in `lgb.train`, then we can treat time as a metric, and record it in the `evals_result` dict.

shiyu1994 avatar Oct 21 '21 03:10 shiyu1994

Thanks @shiyu1994 Can you point me please to the right files to look at to introduce the logging?

nd7141 avatar Oct 21 '21 09:10 nd7141

As far as I know, in CLI version there's no way to store the intermediate results in all iterations into a structured data format.

Yeah, that's right.

There is a special compilation option -DUSE_TIMETAG=ON to make LightGBM prints timings.

Users who want to perform benchmarking can make LightGBM output time costs for different internal routines by adding -DUSE_TIMETAG=ON to CMake flags. https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html

StrikerRUS avatar Oct 21 '21 22:10 StrikerRUS

@nd7141 In CLI version, the metrics are logged here https://github.com/microsoft/LightGBM/blob/d88b44566e5ec1013b1ea4a669366cebadd77879/src/boosting/gbdt.cpp#L517 And the time are logged here https://github.com/microsoft/LightGBM/blob/d88b44566e5ec1013b1ea4a669366cebadd77879/src/boosting/gbdt.cpp#L275 We may store these information in an internal data structure of GBDT, and add a new parameter to allow users to specify a json file to log these information into.

shiyu1994 avatar Oct 22 '21 03:10 shiyu1994

Hi @nd7141

Prince Canuma here, a Data Scientist at Neptune.ai

I would like to understand why would you want to log your metrics to file, is it a preference? What is your exact use case here?

Cheers,

Blaizzy avatar Apr 06 '22 06:04 Blaizzy

Hi @nd7141 Just checking in to see if you still need help with this question or if you need anything else.

Blaizzy avatar May 02 '22 15:05 Blaizzy