pytorch-seq2seq icon indicating copy to clipboard operation
pytorch-seq2seq copied to clipboard

Using trg[: ,;-1] during training

Open wajihullahbaig opened this issue 4 years ago • 7 comments

Thank you for this awesome repo you have made public. I had one question, during the training loop, you perform the following step output, _ = model(src, trg[:,:-1])

I was wondering why are we doing the trg[:,:-1] step?

Kind regards Wajih

wajihullahbaig avatar Sep 04 '20 07:09 wajihullahbaig

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

bentrevett avatar Sep 08 '20 22:09 bentrevett

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

Oh I understand now, Thanks indeed for the elaborated reply.

Wajih

wajihullahbaig avatar Sep 10 '20 05:09 wajihullahbaig

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

Hi, how does this work when the trg sentence is padded? In that case I imagine the eos token would no longer be in last position right? or am I missing something.

EDIT: nevermind I figured it out, in case anyone else is wondering: it works with padded inputs anyway because of ignore_index in the loss function.

fabio-deep avatar Nov 20 '20 00:11 fabio-deep

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]). Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses. Let me know if this needs clarifying.

Hi, how does this work when the trg sentence is padded? In that case I imagine the eos token would no longer be in last position right? or am I missing something.

EDIT: nevermind I figured it out, in case anyone else is wondering: it works with padded inputs anyway because of ignore_index in the loss function.

Sorry for the late reply - seems like you've figured it out now but just in case someone else is reading this then I'll explain.

When we have padding then our trg sequence will be something like [<sos>, A, B, C, <eos>, <pad>, <pad>]. So the sequence input into the decoder is [<sos>, A, B, C, <eos>, <pad>] (trg[:,:,-1]) and our decoder will be trying to predict the sequence [A, B, C, <eos>, <pad>] (trg[:,1:]).

This means that yes, the <eos> token is input into the model even though it shouldn't be - because why should you predict something after the end of the sequence? - but there is no way to avoid this when padding sequences. However, because we set the ignore_index of our CrossEntropyLoss to be the index of the padding token, whenever the decoder's target token is a <pad> token we don't calculate losses over that token.

So in the above example, we only calculate the losses when the decoder's input is [<sos>, A, B, C] because the <eos> and <pad> token both have a target token of <pad>. This means we calculate our losses (and thus update our parameters) as if the padding tokens didn't exist (sort of, we still have to waste some computation but this is offset by the fact that we can use batches instead of feeding in examples one at a time or only making batches where every sequence is the exact same length)

bentrevett avatar Nov 30 '20 17:11 bentrevett

This is because we have a target sequence, trg, of something like [<sos>, A, B, C, <eos>]. We want our decoder to predict what the next item in the predicted target sequence should be, given the previously predicted target tokens. So, we input a sequence of [<sos>, A, B, C] (which is trg[:,:-1]) and want our decoder to predict [A, B, C, <eos>] (which is trg[:,1:]).

Thus, we input trg[:,-1] and use the predicted target with trg[:,1:] to calculate our losses.

Let me know if this needs clarifying.

I have a question the sentence is padded after eos so, the sentences are like: sos y1,y2, eos , pad , pad , pad sos y1,y2,y3,y4,y5, eos sos y1,y2, y3,y4, eos , pad

the size is trg is [3,7] if trg is trg[:,:-1] the sentences is cutted like sos y1,y2, eos , pad , pad
sos y1,y2,y3,y4,y5, sos y1,y2, y3,y4, eos so, it is not cut all eos

I check the torchtxt, the sentence is concatenated as: sos sentence eos pad , trg[:,:-1] will not cut all eos

if sentence is concatenated like: sos sentence pad eos , in this case, it will cut all eos

liuxiaoqun avatar Jun 16 '21 06:06 liuxiaoqun

For anyone who'll find this in future, output, _ = model(src, trg[:,:-1]) seems to no longer be there, but the decoder loop in the Seq2Seq class starts from 0 to trg-1. It's currently written as for t in range(1, trg_len):, where input is always t-1 at the start of each loop (it increments at the end). Took me a minute to figure out where the [:,:-1] went.

https://github.com/bentrevett/pytorch-seq2seq/issues/182 # more in depth explaination of trg[:,:-1] and how it interacts with padding. https://github.com/bentrevett/pytorch-seq2seq/issues/43#issuecomment-554986488 #impact of <sos> and <eos> tokens on src -> model learns to ignore.

ProxJ avatar Feb 27 '22 23:02 ProxJ

For anyone who'll find this in future, output, _ = model(src, trg[:,:-1]) seems to no longer be there, but the decoder loop in the Seq2Seq class starts from 0 to trg-1. It's currently written as for t in range(1, trg_len):, where input is always t-1 at the start of each loop (it increments at the end). Took me a minute to figure out where the [:,:-1] went.

#182 # more in depth explaination of trg[:,:-1] and how it interacts with padding. #43 (comment) #impact of <sos> and <eos> tokens on src -> model learns to ignore.

You are correct. Seems to have been updated now.

wajihullahbaig avatar Feb 28 '22 07:02 wajihullahbaig