attention-is-all-you-need-pytorch
attention-is-all-you-need-pytorch copied to clipboard
In patch_trg, i cant understand why do you change the data shape like that
my dataset is composed with horizontal
i didn't use transpose(0,1)
so i changed your code like below
def patch_trg(trg, pad_idx):
trg , gold = trg[:, :-1], trg[:, 1:].contiguous().view(-1)
return trg, gold
And my dataset example is composed with below sample_1 = bos, 346, 32, 124, 214, eos sample_2 = bos, 346, 124, 214, eos ... sample_N = bos, 346, 32, 32, 32, 124, 214, eos
every length of sample data is different
so, this is my question. if i running your code, when making trg parameter, the eos token of longest sample is deleted that means, in every batch, the longest sample will be trained without eos token
so i want to know the correct role of that code(trg[:, :-1] and trg[:, 1:])?
i think that gold made for to get rid of bos token but i dont know the trg parameter
Hi bro, how did you get the program to work? The dataset doesn't download, the preprocess.py file doesn't work.
@Gi-gigi actually, i didn't use the dataset that @jadore801120 prepare. i just use my datset and i have to change to something in preprocess.py. easily i customize the transformer code.
Perhaps I can answer your question. The ‘trg’ will be used as the input of decoder, and the decoder will predict the next word of known information. The 'gold' will be used as the label of predicted word. Let me give you an example. sample_1 = trg: bos, 346, 32, 124, 214 gold: 346, 32, 124, 214, eos
So, the 'eos' in the trg is meningless, and the loss function does not include it.