nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Loss calculation

Open mailgpa opened this issue 1 year ago • 3 comments

What is the purpose of ignore_index=-1 in loss calculation? I understand it's usually applied to exclude special tokens like padding, sequence end, etc. But nanoGPT does not seem to use any of this.

mailgpa avatar Jun 15 '23 07:06 mailgpa

I looked into this too because of #285 and the CPP code that I think where the value is finally being used is in

https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/LossNLL.cpp

for example,

if (cur_target == ignore_index) {
    continue;
}

but I don't yet see what a value of -1 or -100 (the default) does.

I do see this here

bool isIgnoreIndexValid = (ignore_index != -100);

in

https://github.com/pytorch/pytorch/blob/0d653730ce4314c4c48da0e336c1e1c6259ada28/aten/src/ATen/native/mps/operations/LossOps.mm#L485

so that might indicate that -100 means to ignore the attribute and might explain the default value but I don't know where this code is being called (is this for device=mps only?)

0dB avatar Jun 22 '23 14:06 0dB

Using ignore_index=-1 and ignore_index=-100 should do the same thing, because there is no such thing as a token with a negative token id?

"The default value [-100] is arbitrary, it could have been any negative number, i.e. anything that is not a "valid" class label."

Not sure why -1 is explicitly used here...

transmissions11 avatar Jul 11 '23 23:07 transmissions11

Ok, but there also is code that explicitly checks for -100, but I don't know what that code does …

0dB avatar Jul 12 '23 10:07 0dB