transformer
transformer
copied to clipboard
Published
20 hours ago
•
jayparks
Reame
Issues
**difference** between paper and your code
Open
yuanyihan
opened this issue 2 years ago
• 0 comments
a dropout between two FC in FFN
In the embedding layers, you should multiply those weights by sqrt(d_model).
Sep 02 '21 07:09
yuanyihan