Ben Trevett
Ben Trevett
I'd be interested to see your batch translation and distributed beam search implementations if they are available anywhere. I'm planning to completely re-write the tutorials to coincide with the release...
@Hannibal046 Sorry for the late reply, I have been away on Christmas break. I'm not sure of your exact question but I'll just ramble on through how I think about...
The tutorial 1 model is supposed to be the "worst" out of all of the sequence-to-sequence models implemented in these tutorials, hence why it has a low BLEU score. I'd...
@asigalov61 Sorry for the late reply. Glad the notebooks helped you! I've been desperately needing to update these notebooks for the better part of a year now, but other things...
I don't think I have a great explanation for this, but the second linear layer is more there to reshape the vectors from `pf_dim` back to `hid_dim` and not really...
These bugs are because of some big API changes in torchtext between versions 0.8 and 0.9. I need to update the tutorials appropriately, but haven't yet had the chance to...
This should now be fixed. If you're using PyTorch 1.8 and torchtext 0.9 then the `master` branch should work. If you're using PyTorch 1.7 and torchtext 0.8 then the `torchtext08`...
We could just use `output`, but the notebook is replicating [this](https://arxiv.org/pdf/1409.0473.pdf) paper which calculates the prediction using: the decoder hidden state (`output`), the attention weighted context (`weighted`) and the current...
This is not a problem. Let's say we have the target sequence: `['', 'a', 'b', 'c', 'd', '', '', '']`. Ideally, we should have no padding and have the target...
Training large Transformer models still requires quite a few tricks to make them work properly. The main two techniques I'd recommend looking into are initialization and learning rate schedulers. A...