pytorch-transformer issues

Refactor Causal Mask Generation for Simplicity

I updated the causal_mask function to create a lower triangular matrix using torch.tril, which is more concise and clearer than inverting a mask generated with torch.triu. The functionality of the...

sanzgadea

only spaces get predicted

1

So i am definitely using a different dataset and after training it for 20 epochs the output that is predicted is only spaces. this is the dataset that i used...

aadityapattabhiraman

Clarification regarding dropout in the multihead attention block

Hi @hkproj Why do you add dropout to the attention scores (line 110 in model.py)? Shouldn't you discard the dropout in the multihead attention block because you already add a...

anupsingh15

Not an issue, but a re-implementation

I followed your code and made the repo a bit more neater: https://github.com/chettiargautam/Attention-for-Translation

chettiargautam

Line 60 in Translate.py file: decoder_mask = torch.triu(torch.ones((1, decoder_input.size(1), decoder_input.size(1))), diagonal=1.type(torch.int).type_as(source_mask).to(device) here the decoder_mask create a mask matrix is [0,1,1 0,0,1 0,0,0] , only the lower triangle is masked; which...

dzjxzyd

Quality after 20 epoch training

2

Hello, I tried your script and the resulting model took about 10 hours to train on single 3060 but the quality is still not very good. How could I improve...

thanhnew2001

Fourier Positional Embeddings

I did a small experiment after watching your tutorial and the tutorial by Brainxyz. the idea is to convert each token (a word in my case) into a sin signal....

mourad-ghafiri

Regarding add and norm block

1

The provided code is **x+sublayer(self.norm(x))** in **model.py residual connection function** but in paper it mentioned add and norm, that does mean **self.norm(x+sublayer(x))**. please clarify the same.

DevasishY

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

1

I am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) environment. Below are the details and...

pratheeshkumar99

Inconsistency in model.decode() and forward method of Decode class

In train.py, we do: ``` decoder_output = model.decode(encoder_output, encoder_mask, decoder_input, decoder_mask) # (B, seq_len, d_model) ``` encoder_output: (b, seq_len, d_model) encoder_mask: (B, 1, 1, seq_len) decoder_input: (b, seq_len) decoder_mask: (B,1,...

SwastikGorai

pytorch-transformer
pytorch-transformer copied to clipboard

Metadata

Refactor Causal Mask Generation for Simplicity

only spaces get predicted

Clarification regarding dropout in the multihead attention block

Not an issue, but a re-implementation

translate.py not consistent

Quality after 20 epoch training

Fourier Positional Embeddings

Regarding add and norm block

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

Inconsistency in model.decode() and forward method of Decode class

← Metadata

Owner

Metadata

pytorch-transformer pytorch-transformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

pytorch-transformer
pytorch-transformer copied to clipboard