helloworld729 issues

Repositories
Issues
Comments

Results 1 issues of


                                            helloworld729

why none pad mask is nedd

您好，感谢您的代码，有两点疑惑请您看一下： 1、为什么需要none_pad_mask：以编码侧为例，由于attention mask的存在，那么每一次attention都会屏蔽掉padding的位置(权重为0)，那么为什么attention之后还需要none_pad_mask处理呢？就算考虑线性层和layerNorm也不需要吧，因为这两个操作都是以单个word为单位，padding不会影响到正常词语，我觉得在最后预测的时候把none_pad_mask加上处理一下就行了吧。 2、为什么embedding共享的时候需要乘以缩放系数？即：seq_logit = self.tgt_word_prj(dec_output) * self.x_logit_scale 1. Why do you need none_ pad_ mask： Take the Encoder as an example. Due to the presence of the...