xgboost Add custom logger

Hi. Is there any way to add a logging.logger object to a model, so all the printed output goes to the logger instead of just the screen?

Aug 01 '23 13:08 ahowe42

Not without some hacking at the moment, see https://github.com/dmlc/xgboost/blob/912e341d575f107be1cc2631271fd0737b75dfba/python-package/xgboost/core.py#L231 .

Aug 01 '23 14:08 trivialfis

You can use callback to log evaluation process. Here is my hack way:

# defined your logger object first
import logging
import sys
import os

# clear previous logging configuration
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s | %(levelname)s | %(message)s')

stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.INFO)
stdout_handler.setFormatter(formatter)

file_handler = logging.FileHandler(f'xgb_optimize.log') # the log file name
file_handler.setLevel(logging.INFO)
file_handler.setFormatter(formatter)

logger.addHandler(file_handler)
logger.addHandler(stdout_handler)


# define logging leverage on xgb callback
# logger object is defined above
class XGBLogging(xgb.callback.TrainingCallback):
    """log train logs to file"""

    def __init__(self, epoch_log_interval=100):
        self.epoch_log_interval = epoch_log_interval

    def after_iteration(self, model, epoch:int, evals_log:xgb.callback.TrainingCallback.EvalsLog):
        if (epoch %  self.epoch_log_interval == 0):
            for data, metric in evals_log.items():
                for metric_name, log in metric.items():
                    score = log[-1][0] if isinstance(log[-1], tuple) else log[-1]
                    logger.info(f"XGBLogging epoch {epoch} dataset {data} {metric_name} {score}")

        # False to indicate training should not stop.
        return False

# apply callback when .train
output = xgb.train(params=param, dtrain=dtrain, 
         num_boost_round=500,  
        custom_metric=your_own_eval_metric_func,
          early_stopping_rounds=50, 
       callbacks=[XGBLogging(epoch_log_interval=5)],
        verbose_eval=True
 )

The issue for this hacking is : if I use distributed training like dask.DaskDMatrix instead of native xgboost data structure DMatrix, the log will NOT be saved @trivialfis .. not sure why. ..

Dec 01 '23 22:12 pangjac

xgboost xgboost copied to clipboard

Add custom logger

xgboost
xgboost copied to clipboard