Javen_陈俊文
Javen_陈俊文
decoder的output是每个词的概率,具体要生成哪个词,可以采用不同的策略.直接选择概率最大的,就是greedy,还可以用beam search的策略,可以了解下。decoder中把每一个时刻产生的输出重新embedding,作为下一个时刻的输入之一。至于你说的keys()问题,谢谢指出,有时间我会修改下。
Hi Steven, Thank you for your interest in this project. I will provide an English version of README and update the code a little bit. It will cost several days...
Yes you are right. I didn't use attention in the classification model. I would add this part or you can commit your code if you like. Thank you for pointing...