DGAI Chapter 08: Passing predictions to loss function

Chapter 08: Passing predictions to loss function

Open tobiasschweizer opened this issue 3 months ago • 1 comments

Hi @markhliu,

When passing the predicted tokens to the CrossEntropyLoss function, the second and third dimensions are swapped:

https://github.com/markhliu/DGAI/blob/main/ch08TextGenerationRNNs.ipynb

 output, (sh,sc) = model(inputs, (sh,sc))
 loss = loss_func(output.transpose(1,2),targets)

Here is what I did to understand why this is necessary:

These are the shapes at runtime:

output original shape: torch.Size([32, 100, 12778]) # batch of 32, each containing a prediction of a sequence of 100 words for 12778 tokens / categories
output shape transposed: torch.Size([32, 12778, 100]) # batch of 32, second and third dimensions swapped
targets shape: torch.Size([32, 100]) # batch of 32, each containing the index of the expected token for a sequence of 100 words

I removed transpose(1,2) and then got:

Expected target size [32, 12778], got [32, 100]

So I understand that the input's second and third dimension have to be swapped so that the target's shape can be (32, 100) (otherwise it would be expected to be (32, 12778) which gives a runtime error).

So I checked the docs:

Input: Shape (C), (N, C) or (N, C, d1, d2, dk ...)
Target: Shape (), (N), (N, d1, d2, dk ...)

C is the number of classes: 12778 N is the batch size: 32 d1 to dk are the words in the predicted sequence (length 100, so d_1 to d_100)

So that means:

Input transposed (N, C, d1, d2, dk ...) is torch.Size([32, 12778, 100])
Target (N, d1, d2, dk ...) is torch.Size([32, 100])

Is this understanding correct? And why are the predictions not transposed in chapter 02's multi-category classifier's loss function which is also CrossEntropyLoss? Because there is just one item (instead of d1 ... dk) and the shape is thus (N, C) (batch size and then 10 categories)?

Thanks for your feedback!

Aug 09 '25 13:08 tobiasschweizer

DGAI DGAI copied to clipboard

Chapter 08: Passing predictions to loss function

DGAI
DGAI copied to clipboard