Ben Trevett

Results 90 comments of Ben Trevett

I'd be interested to see your batch translation and distributed beam search implementations if they are available anywhere. I'm planning to completely re-write the tutorials to coincide with the release...

@Hannibal046 Sorry for the late reply, I have been away on Christmas break. I'm not sure of your exact question but I'll just ramble on through how I think about...

The tutorial 1 model is supposed to be the "worst" out of all of the sequence-to-sequence models implemented in these tutorials, hence why it has a low BLEU score. I'd...

@asigalov61 Sorry for the late reply. Glad the notebooks helped you! I've been desperately needing to update these notebooks for the better part of a year now, but other things...

I don't think I have a great explanation for this, but the second linear layer is more there to reshape the vectors from `pf_dim` back to `hid_dim` and not really...

These bugs are because of some big API changes in torchtext between versions 0.8 and 0.9. I need to update the tutorials appropriately, but haven't yet had the chance to...

This should now be fixed. If you're using PyTorch 1.8 and torchtext 0.9 then the `master` branch should work. If you're using PyTorch 1.7 and torchtext 0.8 then the `torchtext08`...

We could just use `output`, but the notebook is replicating [this](https://arxiv.org/pdf/1409.0473.pdf) paper which calculates the prediction using: the decoder hidden state (`output`), the attention weighted context (`weighted`) and the current...

This is not a problem. Let's say we have the target sequence: `['', 'a', 'b', 'c', 'd', '', '', '']`. Ideally, we should have no padding and have the target...

Training large Transformer models still requires quite a few tricks to make them work properly. The main two techniques I'd recommend looking into are initialization and learning rate schedulers. A...