gpt-2 Train loss

Train loss

Open alecalma opened this issue 5 years ago • 0 comments

Hi all,

I can't understand how the loss is computed, in particular what is being compared.

If i print the two tensors which appear in the loss term during execution, I get:

CONTEXT [[290 1526 1636 1526 75 1357 12 11...]...]

and

OUTPUT_LOGITS[[[-36.8163338 -36.7796745 -40.5458221 -39.6132202 -40.1266747 -40.50746...]]...]

Could you please explain how are they related? And how the training happens? Thanks a lot.

Aug 28 '19 13:08 alecalma