Leon Chlon
Leon Chlon
@vroulet I'm a bit late to this but I was experimenting with second-order optimisation methods using focal loss and discovered NaN values whilst computing Hessians with gamma < 2. The...
Thanks for your feedback @vroulet , This approach keeps all computations in log-space to avoid numerical issues, as suggested: 1. Compute log probabilities: log p = log_sigmoid(logits) log(1-p) = log_sigmoid(-logits)...
@vroulet You're absolutely right that in general, log(a + b) ≠ log(a) + log(b). However, the expression here involves binary labels, which creates a special case. For binary labels (y...
@vroulet I think I've got it. The focal loss for binary classification is defined as: $$\text{FL}(p_t) = -\alpha_t (1 - p_t)^{\gamma} \log(p_t)$$ where $p_t$ is the probability of the true...