utils.pytorch Question about label smoothing implementation

Question about label smoothing implementation

Open daveboat opened this issue 4 years ago • 0 comments

Hi, I've been going over your implementation of label smoothing for cross-entropy, and I don't understand why, in this code in cross_entropy.py:

        eps_sum = smooth_eps / num_classes
        eps_nll = 1. - eps_sum - smooth_eps
        likelihood = lsm.gather(dim=-1, index=target.unsqueeze(-1)).squeeze(-1)
        loss = -(eps_nll * likelihood + eps_sum * lsm.sum(-1))

you have eps_nll = 1. - eps_sum - smooth_eps instead of just eps_nll = 1. - smooth_eps. Doesn't eps_nll = 1. - eps_sum - smooth_eps introduce an extra term in the loss that shouldn't be there? Going by the paper,

sum_k q(k) log p(k),

is likelihood in the above snippet and

sum_k log p(k),

is lsm.sum(-1). The label-smoothed loss, for uniform u(k), should be

(1-epsilon) sum_k q(k) log p(k) - epsilon/K sum_k log p(k),

so shouldn't it be

loss = -((1 - smooth_eps) * likelihood + eps_sum * lsm.sum(-1))?

Mar 12 '20 15:03 daveboat

utils.pytorch utils.pytorch copied to clipboard

Question about label smoothing implementation

utils.pytorch
utils.pytorch copied to clipboard