Ivan Fursov

Results 4 comments of Ivan Fursov

This will be so helpful, thanks! Do you have any approximate timeline for when it could happen?

@PonteIneptique Hi! Totally makes sense. I assume, you don't need to take padding tokens into account when calculating the loss value.

Maybe something like? ```python def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor: ignore_condition = torch.ne(targets, self.ignore_index) logits = logits[ignore_condition] targets = targets[ignore_condition] ... ```

Yes, outside the module. Actually, the above example should have used `tokens` instead of `targets` to obtain the mask (`ignore_condition`). ```python import torch PADDING_IDX = 0 VOCAB_SIZE = 1000 BATCH_SIZE...