Ben Trevett comments

Results 90 comments of


                                            Ben Trevett

In Tutorial 1 target sequence len is used at time of evaluation

For evaluation (measuring the validation/test loss) we have to always generate the exact same amount of tokens as in the actual target sequence because that is how we measure our...

Using trg[: ,;-1] during training

This is because we have a target sequence, `trg`, of something like `[, A, B, C, ]`. We want our decoder to predict what the next item in the predicted...

Using trg[: ,;-1] during training

> > This is because we have a target sequence, `trg`, of something like `[, A, B, C, ]`. We want our decoder to predict what the next item in...

beam search

Beam search is something I am planning to implement when I get the time.

In the encoder. Why not use pack_padded_sequence (embedded, input_lengths)

Not sure I understand the question, sorry. Are you asking why we use `pack_padded_sequence` in notebook 4?

In the encoder. Why not use pack_padded_sequence (embedded, input_lengths)

@yugaljain1999 We can try running some code to help us understand the packed sequences batching. ```python import torch import torch.nn as nn max_length = 10 batch_size = 3 emb_dim =...

Why the batch size is misplaced in the tensor?

I feel like this is more of an implementation issue, or personal preference. The way I've structured the tutorials (and the way I think about these things) is that if...

Question about tutorial 1 and 2 Decoder

When we have a sequence length of one, which we do when decoding, then `output == hidden`, as `output` is the hidden state from all time-steps, and the `hidden` is...

Tutorial 3 : Aligment Model not the same as the paper

Not sure how I messed this up. Will look into it further. Thanks for pointing it out.

How to matrix multiply two encoded representations?

You can do what is done in the `MultiHeadAttentionLayer` and split the `hid_dim` into multiple "heads", but as it stands you have to do elementwise operations. What are you trying...