examples
examples copied to clipboard
Is the loss of the first word covered during the language model evaluation?
In the language model example, it seems that during the evaluation, the code starts from computing the loss of the second word. Thus, skipping the loss of the first word. https://github.com/pytorch/examples/blob/537f6971872b839b36983ff40dafe688276fe6c3/word_language_model/main.py#L136 https://github.com/pytorch/examples/blob/537f6971872b839b36983ff40dafe688276fe6c3/word_language_model/main.py#L121-L125
Furthermore, the evaluation data is divided into 10 batches, hence, the losses of 10 words are skipped. Am I right or I did miss something? https://github.com/pytorch/examples/blob/537f6971872b839b36983ff40dafe688276fe6c3/word_language_model/main.py#L85-L88