pytorch-transformer
pytorch-transformer copied to clipboard
Attention is all you need implementation
I updated the causal_mask function to create a lower triangular matrix using torch.tril, which is more concise and clearer than inverting a mask generated with torch.triu. The functionality of the...
So i am definitely using a different dataset and after training it for 20 epochs the output that is predicted is only spaces. this is the dataset that i used...
Hi @hkproj Why do you add dropout to the attention scores (line 110 in model.py)? Shouldn't you discard the dropout in the multihead attention block because you already add a...
I followed your code and made the repo a bit more neater: https://github.com/chettiargautam/Attention-for-Translation
Line 60 in Translate.py file: decoder_mask = torch.triu(torch.ones((1, decoder_input.size(1), decoder_input.size(1))), diagonal=1.type(torch.int).type_as(source_mask).to(device) here the decoder_mask create a mask matrix is [0,1,1 0,0,1 0,0,0] , only the lower triangle is masked; which...
Hello, I tried your script and the resulting model took about 10 hours to train on single 3060 but the quality is still not very good. How could I improve...
I did a small experiment after watching your tutorial and the tutorial by Brainxyz. the idea is to convert each token (a word in my case) into a sin signal....
The provided code is **x+sublayer(self.norm(x))** in **model.py residual connection function** but in paper it mentioned add and norm, that does mean **self.norm(x+sublayer(x))**. please clarify the same.
I am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) environment. Below are the details and...
In train.py, we do: ``` decoder_output = model.decode(encoder_output, encoder_mask, decoder_input, decoder_mask) # (B, seq_len, d_model) ``` encoder_output: (b, seq_len, d_model) encoder_mask: (B, 1, 1, seq_len) decoder_input: (b, seq_len) decoder_mask: (B,1,...