nanoGPT nit: remove explicit ignore

nit: remove explicit ignore_index=-1

Open transmissions11 opened this issue 11 months ago • 0 comments

I believe setting ignore_index=-1 when calling cross_entropy on line 187 is spurious, as:

ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the target list equal to N (see: https://github.com/pytorch/pytorch/issues/18206)
thus, all negative values for ignore_index do the same thing: nothing, because all GPT tokens are positive integers.
the default value for ignore_index is -100 (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html)

Given this has generated a non-zero amount of confusion (https://github.com/karpathy/nanoGPT/issues/323, https://github.com/karpathy/nanoGPT/issues/297), and this repo is intended to be a learning resource, I think removing this would be a good choice to avoid further confusion.

I may be missing something here, feel free to close and correct me, but I've tested training with ignore_index=-100 and it seems to work equivalently.

Jul 11 '23 23:07 transmissions11

nanoGPT nanoGPT copied to clipboard

nit: remove explicit ignore_index=-1

nanoGPT
nanoGPT copied to clipboard