Neng Huang

Results 1 comments of Neng Huang

It seems that if each ScaledDotProductAttention uses a dropout, the result will be better. But it is just in my experiment.