transformer
transformer copied to clipboard
**difference** between paper and your code
- a dropout between two FC in FFN
- In the embedding layers, you should multiply those weights by sqrt(d_model).