nanoGPT
nanoGPT copied to clipboard
Loss calculation
What is the purpose of ignore_index=-1 in loss calculation? I understand it's usually applied to exclude special tokens like padding, sequence end, etc. But nanoGPT does not seem to use any of this.
I looked into this too because of #285 and the CPP code that I think where the value is finally being used is in
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/LossNLL.cpp
for example,
if (cur_target == ignore_index) {
continue;
}
but I don't yet see what a value of -1
or -100
(the default) does.
I do see this here
bool isIgnoreIndexValid = (ignore_index != -100);
in
https://github.com/pytorch/pytorch/blob/0d653730ce4314c4c48da0e336c1e1c6259ada28/aten/src/ATen/native/mps/operations/LossOps.mm#L485
so that might indicate that -100
means to ignore the attribute and might explain the default value but I don't know where this code is being called (is this for device=mps only?)
Using ignore_index=-1
and ignore_index=-100
should do the same thing, because there is no such thing as a token with a negative token id?
Not sure why -1
is explicitly used here...
Ok, but there also is code that explicitly checks for -100
, but I don't know what that code does …