Ben Trevett comments

Results 90 comments of


                                            Ben Trevett

Tutorial 6: Batch inference to speed up the translation

I'd be interested to see your batch translation and distributed beam search implementations if they are available anywhere. I'm planning to completely re-write the tutorials to coincide with the release...

Tutorial 6: Multihead Attention

@Hannibal046 Sorry for the late reply, I have been away on Christmas break. I'm not sure of your exact question but I'll just ramble on through how I think about...

A question about tutorial1.

The tutorial 1 model is supposed to be the "worst" out of all of the sequence-to-sequence models implemented in these tutorials, hence why it has a low BLEU score. I'd...

Thank you!

@asigalov61 Sorry for the late reply. Glad the notebooks helped you! I've been desperately needing to update these notebooks for the better part of a year now, but other things...

Tutorial 6: PositionWiseFeedforwardLayer - fc_2 activation function

I don't think I have a great explanation for this, but the second linear layer is more there to reshape the vectors from `pf_dim` back to `hid_dim` and not really...

Tutorial 3: failed to run in google colab

These bugs are because of some big API changes in torchtext between versions 0.8 and 0.9. I need to update the tutorials appropriately, but haven't yet had the chance to...

Tutorial 3: failed to run in google colab

This should now be fixed. If you're using PyTorch 1.8 and torchtext 0.9 then the `master` branch should work. If you're using PyTorch 1.7 and torchtext 0.8 then the `torchtext08`...

Tutorial 4: Decoder - the calculation of prediction

We could just use `output`, but the notebook is replicating [this](https://arxiv.org/pdf/1409.0473.pdf) paper which calculates the prediction using: the decoder hidden state (`output`), the attention weighted context (`weighted`) and the current...

tut 6 - when slicing the <eos> token off from trg before feeding it into the model

This is not a problem. Let's say we have the target sequence: `['', 'a', 'b', 'c', 'd', '', '', '']`. Ideally, we should have no padding and have the target...

Tutorial 6: Attention is all you need

Training large Transformer models still requires quite a few tricks to make them work properly. The main two techniques I'd recommend looking into are initialization and learning rate schedulers. A...