LightGBM
LightGBM copied to clipboard
Logging metrics and time information
Summary
When I use CLI distributed training I would like to write/log metrics and time information during training into file.
What is the current method to see how my training loss evolves?
Motivation
This is a crucial part for debugging ML models: being able to see how training vs test loss behaves allows to capture overfitting. Saving properly time information is also crucial to compare different frameworks as well as the trade-off between quality and speed. Besides time can be broken down into the time of preprocessing and the time to build a tree, which is important to profile the timing of a model.
The only way I found now is to parse through the stdout messages. Did I miss some other ways to log/save metrics and time info?
References
CatBoost example: https://catboost.ai/en/docs/concepts/output-data_training-log
@nd7141 Thanks for using LightGBM. As far as I know, in CLI version there's no way to store the intermediate results in all iterations into a structured data format. Gently ping @StrikerRUS to confirm. But I agree that storing intermediate results into something like a json file would be very useful.
Thanks @shiyu1994. How hard it would be to implement logging into a file?
Also I can see that it's possible to log metrics inside evals_result dict (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html) but is it possible to log time and particularly breakdown between building a tree and preprocessing the data?
@nd7141 We are focusing on several large pull request recently, and maybe we can schedule the implementation of saving results into file for CLI version on the next month. Contribution is quite welcome.
Yes, with python we can use evals_result, but it does not support recording of time natively. A simple work around would be write a customized evaluation function like this
start_time = time.time()
def feval_time(preds, data):
return 'time', time.time() - start_time, True
```.
And specify `feval=feval_time` in `lgb.train`, then we can treat time as a metric, and record it in the `evals_result` dict.
Thanks @shiyu1994 Can you point me please to the right files to look at to introduce the logging?
As far as I know, in CLI version there's no way to store the intermediate results in all iterations into a structured data format.
Yeah, that's right.
There is a special compilation option -DUSE_TIMETAG=ON to make LightGBM prints timings.
Users who want to perform benchmarking can make LightGBM output time costs for different internal routines by adding
-DUSE_TIMETAG=ONto CMake flags. https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html
@nd7141 In CLI version, the metrics are logged here
https://github.com/microsoft/LightGBM/blob/d88b44566e5ec1013b1ea4a669366cebadd77879/src/boosting/gbdt.cpp#L517
And the time are logged here
https://github.com/microsoft/LightGBM/blob/d88b44566e5ec1013b1ea4a669366cebadd77879/src/boosting/gbdt.cpp#L275
We may store these information in an internal data structure of GBDT, and add a new parameter to allow users to specify a json file to log these information into.
Hi @nd7141
Prince Canuma here, a Data Scientist at Neptune.ai
I would like to understand why would you want to log your metrics to file, is it a preference? What is your exact use case here?
Cheers,
Hi @nd7141 Just checking in to see if you still need help with this question or if you need anything else.