nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

nit: remove explicit ignore_index=-1

Open transmissions11 opened this issue 11 months ago • 0 comments

I believe setting ignore_index=-1 when calling cross_entropy on line 187 is spurious, as:

  1. ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the target list equal to N (see: https://github.com/pytorch/pytorch/issues/18206)

  2. thus, all negative values for ignore_index do the same thing: nothing, because all GPT tokens are positive integers.

  3. the default value for ignore_index is -100 (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html)

Given this has generated a non-zero amount of confusion (https://github.com/karpathy/nanoGPT/issues/323, https://github.com/karpathy/nanoGPT/issues/297), and this repo is intended to be a learning resource, I think removing this would be a good choice to avoid further confusion.

I may be missing something here, feel free to close and correct me, but I've tested training with ignore_index=-100 and it seems to work equivalently.

transmissions11 avatar Jul 11 '23 23:07 transmissions11