LightGBM
LightGBM copied to clipboard
How does Tweedie objective follow from the Tweedie metric?
I am trying to understand the Tweedie metric and objective. In my simple mind, for a given loss metric, the gradient used in the corresponding objective should be the derivative of the metric with respect to the score (save for some constant multiplication). We can easily check that this holds for e.g. the l2 loss (see metric and objective - technically the metric should be multiplied by 0.5
to yield the correct gradient and hessian of the objective function, but otherwise the gradient and hessian straightforwardly follow from differentiating the metric wrt the score). However, this does not seem to hold for the Tweedie metric and objective.
The Tweedie loss metric is computed as follows (stated below in Python code, using numpy):
eps = 1.0e-10
score = np.maximum(eps, score)
a = label * np.exp((1 - rho) * np.log(score)) / (1 - rho)
b = np.exp((2 - rho) * np.log(score)) / (2 - rho)
loss = np.sum(-a + b)
We can equivalently write a
and b
as:
a = label * score**(1 - rho) / (1 - rho)
b = score**(2 - rho) / (2 - rho)
such that the metric has the following gradient and hessian with respect to the score:
gradient = score**(1 - rho) - label / score**rho
hessian = (rho * label) / score**(rho + 1) - (rho - 1) / score**rho
which is different from the gradient and hessian that we find in the Tweedie objective used by LightGBM:
exp_1_score = np.exp((1 - rho) * score)
exp_2_score = np.exp((2 - rho) * score)
gradient = -label * exp_1_score + exp_2_score
hessian = -label * (1 - rho) * exp_1_score + (2 - rho) * exp_2_score
My questions are:
- How do the Tweedie metric and objective relate?
- What is the source of the formulas used by LightGBM for both the Tweedie metric and the Tweedie objective?
To partially answer my own question, by integrating the gradient it seems that taking the log of the score in the metric results in the difference between the differentiated metric and the gradient in the objective function. In other words, when I reformulate the a
and b
components as follows (i.e. dropping the log):
a = label * np.exp((1 - rho) * score) / (1 - rho)
b = np.exp((2 - rho) * score) / (2 - rho)
the gradient and hessian that are in the objective function follow when taking the derivative of the metric. So, my question then perhaps is, why is the log of the score taken in the metric? And is there a source for these formulas?
Same question, also I believe there is a question about this on stackoverflow: https://stackoverflow.com/questions/71623674/lightgbm-with-tweedie-loss-im-confused-on-the-gradient-and-hessians-used could you provide some reading on tweedie loss topic?
@Zhylkaaa I think the answer is in these lines. The Tweedie loss inherits from the RegressionPoissonLoss, so the LightGBM leaf values contain the raw score. To get the output, the raw score needs to be exponentiated (see here). This explains the difference between the objective and metric. Also, with the formulas starting in line 431 it was relatively straightforward for me to rebuild the Tweedie loss as a custom objective.
It would still be nice to have a source for these formulas, though, to better understand them.... What source are they derived from?
@elephaint yeah I saw this in this issue https://github.com/microsoft/LightGBM/issues/3155 but I still don't know how the gradient was derived, because I am arriving on different result every time :(
@elephaint already provided the answer. To make it even clearer:
- As training loss: The Tweedie deviance is a function of labels (y) and raw scores.
Take the Tweedie deviance, e.g. from https://en.wikipedia.org/wiki/Tweedie_distribution, and mind that a log-link is used, i.e. one predicts
exp(raw_scores)
. Then take the gradient w.r.t. "raw scores". - As a metric: The Tweedie deviance is a function of labels (y) and scores (predictions). Here, no link is used, or it is an internal matter of the models prediction.
@guolinke or @shiyu1994 can you please read through the questions here and help resolve this discussion?
@lorentzenchr I understand exp and non exp thing, but I don't understand how the derivative of -label * pred ^ (1 - rho) / (1-rho) + pred ^ (2 - rho) / (2 - rho)
w.r.t pred
is the same expression without denominators where it should be -label * pred ^ (-rho) + pred ^ (1 - rho)
to my understanding at least. So my and probably @elephaint's question is where this derivation comes from, maybe I have (very probably) missed something?
I guess the confusion comes from the distinction between
- $y_{pred} = \exp(score)$
- Derivative of the deviance with respect to $score$ and not $y_{pred}$!
$dev = -label * \frac{\exp(score * (1 - \rho))}{1-\rho} + \ldots$ from which follows $\frac{\mathrm{d}}{\mathrm{d}score} dev = -label * \exp(score * (1 - \rho)) + \ldots$.
I think this issue can be closed.