macaron-net
macaron-net copied to clipboard
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
Results
3
macaron-net issues
Sort by
recently updated
recently updated
newest added
Hi, wonderful job. I found the idea of your stacked bert is similar to the share weights in NAS. ^_^
尝试跑了一下iwslt2014的模型,跑了一天发现都没有停。想问一下这个模型大概需要跑多少个epoch/iteration?
The comparison with BERT looks impressive. Have you tried the macaron-net on training a Chinese corpus ?