macaron-net icon indicating copy to clipboard operation
macaron-net copied to clipboard

Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"

Results 3 macaron-net issues
Sort by recently updated
recently updated
newest added

Hi, wonderful job. I found the idea of your stacked bert is similar to the share weights in NAS. ^_^

尝试跑了一下iwslt2014的模型,跑了一天发现都没有停。想问一下这个模型大概需要跑多少个epoch/iteration?

The comparison with BERT looks impressive. Have you tried the macaron-net on training a Chinese corpus ?