nanoGPT
nanoGPT copied to clipboard
Why in the cross_entropy, set ignore_index be -1?
Why in the cross_entropy, set ignore_index be -1? According to the [doc] (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html), ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute to the input gradient".
# if we are given some desired targets also calculate the loss
loss = None
if targets is not None:
logits = self.lm_head(x)
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
else:
logits = self.lm_head(x[:, :-1, :]).squeeze(1)
return logits, loss
Same question, but you're incorrect that "ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute to the input gradient"
-
ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the target list equal to N (see: https://github.com/pytorch/pytorch/issues/18206)
-
thus, all negative values for ignore_index do the same thing: nothing, because all GPT tokens are positive integers.