nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Why in the cross_entropy, set ignore_index be -1?

Open randbear opened this issue 1 year ago • 1 comments

Why in the cross_entropy, set ignore_index be -1? According to the [doc] (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html), ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute to the input gradient".

# if we are given some desired targets also calculate the loss
loss = None
if targets is not None:
    logits = self.lm_head(x)
    loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
else:
    logits = self.lm_head(x[:, :-1, :]).squeeze(1)

return logits, loss

randbear avatar Jul 03 '23 02:07 randbear

Same question, but you're incorrect that "ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute to the input gradient"

  1. ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the target list equal to N (see: https://github.com/pytorch/pytorch/issues/18206)

  2. thus, all negative values for ignore_index do the same thing: nothing, because all GPT tokens are positive integers.

transmissions11 avatar Jul 11 '23 23:07 transmissions11