utils.pytorch
utils.pytorch copied to clipboard
Question about label smoothing implementation
Hi, I've been going over your implementation of label smoothing for cross-entropy, and I don't understand why, in this code in cross_entropy.py:
eps_sum = smooth_eps / num_classes
eps_nll = 1. - eps_sum - smooth_eps
likelihood = lsm.gather(dim=-1, index=target.unsqueeze(-1)).squeeze(-1)
loss = -(eps_nll * likelihood + eps_sum * lsm.sum(-1))
you have eps_nll = 1. - eps_sum - smooth_eps
instead of just eps_nll = 1. - smooth_eps
. Doesn't eps_nll = 1. - eps_sum - smooth_eps
introduce an extra term in the loss that shouldn't be there? Going by the paper,
sum_k q(k) log p(k),
is likelihood
in the above snippet and
sum_k log p(k),
is lsm.sum(-1)
. The label-smoothed loss, for uniform u(k), should be
- (1-epsilon) sum_k q(k) log p(k) - epsilon/K sum_k log p(k),
so shouldn't it be
loss = -((1 - smooth_eps) * likelihood + eps_sum * lsm.sum(-1))
?