transformer icon indicating copy to clipboard operation
transformer copied to clipboard

Questions regarding the implementation

Open yuvaraj91 opened this issue 3 years ago • 10 comments

  1. In this file https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py, you define the functions make_pad_mask and make_no_peak_mask, but it is actually used during training?
  2. In this file, https://github.com/hyunwoongko/transformer/blob/master/models/layers/position_wise_feed_forward.py why does your PositionwiseFeedForward have extra layers?

yuvaraj91 avatar Oct 11 '21 19:10 yuvaraj91

  1. yes. see https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py#L40.
  2. what is extra layers?

hyunwoongko avatar Oct 11 '21 20:10 hyunwoongko

Thank you for your reply :)

For (2), yes you are right actually. I made a mistake when comparing the class PositionwiseFeedForwardwith these two implentations; in https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb and http://nlp.seas.harvard.edu/2018/04/03/attention.html. But now I see that yours is the same, but you just coded it in a different format.

I have another question, how could we visualise the attention heatmap at the decoder heads, similar to https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb?

yuvaraj91 avatar Oct 12 '21 05:10 yuvaraj91

And here. https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/models/model/transformer.py#L45, here you use "*" symbol to mean a multiplication? In the other implementations, a bitwise "&" operator was used. I am just wondering what is the difference here. Thanks!

yuvaraj91 avatar Oct 12 '21 06:10 yuvaraj91

@GJ98 could you explain about this? (note this mask implementation was not written by me)

hyunwoongko avatar Oct 12 '21 07:10 hyunwoongko

I have another question, how could we visualise the attention heatmap at the decoder heads, similar to

I was planning to implement it, but I didn't do it because I didn't have enough time. I welcome PR!

hyunwoongko avatar Oct 12 '21 07:10 hyunwoongko

@Yuvaraj91 There is no difference between "*" and "&". I think "&" can be more clear than "*".

GJ98 avatar Oct 12 '21 07:10 GJ98

Ok thank you both @GJ98 @hyunwoongko !

Another question, where do you get the value of 256 from? https://github.com/hyunwoongko/transformer/blob/1d2e33f675232956ef4bc3fbb1c3de2300a1f0a7/conf.py#L13

yuvaraj91 avatar Oct 12 '21 10:10 yuvaraj91

What do you mean?

hyunwoongko avatar Oct 22 '21 20:10 hyunwoongko

Hi, could you share all the requirements of this repo, like pytorch version etc. Thanks.

Qing-zhan avatar Oct 15 '22 14:10 Qing-zhan