gpt-2
gpt-2 copied to clipboard
Train loss
Hi all,
I can't understand how the loss is computed, in particular what is being compared.
If i print the two tensors which appear in the loss term during execution, I get:
CONTEXT [[290 1526 1636 1526 75 1357 12 11...]...]
and
OUTPUT_LOGITS[[[-36.8163338 -36.7796745 -40.5458221 -39.6132202 -40.1266747 -40.50746...]]...]
Could you please explain how are they related? And how the training happens? Thanks a lot.