DGAI
DGAI copied to clipboard
Chapter 08: Passing predictions to loss function
Hi @markhliu,
When passing the predicted tokens to the CrossEntropyLoss function, the second and third dimensions are swapped:
https://github.com/markhliu/DGAI/blob/main/ch08TextGenerationRNNs.ipynb
output, (sh,sc) = model(inputs, (sh,sc))
loss = loss_func(output.transpose(1,2),targets)
Here is what I did to understand why this is necessary:
These are the shapes at runtime:
- output original shape: torch.Size([32, 100, 12778]) # batch of 32, each containing a prediction of a sequence of 100 words for 12778 tokens / categories
- output shape transposed: torch.Size([32, 12778, 100]) # batch of 32, second and third dimensions swapped
- targets shape: torch.Size([32, 100]) # batch of 32, each containing the index of the expected token for a sequence of 100 words
I removed transpose(1,2) and then got:
Expected target size [32, 12778], got [32, 100]
So I understand that the input's second and third dimension have to be swapped so that the target's shape can be (32, 100) (otherwise it would be expected to be (32, 12778) which gives a runtime error).
So I checked the docs:
- Input: Shape (C), (N, C) or (N, C, d1, d2, dk ...)
- Target: Shape (), (N), (N, d1, d2, dk ...)
C is the number of classes: 12778 N is the batch size: 32 d1 to dk are the words in the predicted sequence (length 100, so d_1 to d_100)
So that means:
- Input transposed (N, C, d1, d2, dk ...) is torch.Size([32, 12778, 100])
- Target (N, d1, d2, dk ...) is torch.Size([32, 100])
Is this understanding correct? And why are the predictions not transposed in chapter 02's multi-category classifier's loss function which is also CrossEntropyLoss? Because there is just one item (instead of d1 ... dk) and the shape is thus (N, C) (batch size and then 10 categories)?
Thanks for your feedback!