CheerM

Results 7 comments of CheerM

> Can you share the code of SWIN V2 > > > 你好呀, > > 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新实现 swint,就像在手 (Swin V2) 中的 81.6 top1-FF 之后的报告中那样。将 MHSA/FF 的层一个设置在训练原始swin-t的剩余分支中。我们发现grad_norm在10个纪元后出现(每次会非常大),然后训练是否无异常。不管主干出现顶部存在out_norm,训练最终都发散了。post-norm有什么额外的设置吗?如果可以发布更多细节,将不胜感激。 > > 谢谢...

Here is the log for 14 epoch, gradient fluctuates ``[2022-03-14 09:02:33 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 146): INFO Accuracy of the network on the 50000 test images: 16.4% [2022-03-14 09:02:33 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 148):...

> > > 能分享一下SWIN V2的代码吗 > > > > 你好呀, > > > > 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新 swint,就像在手(Swin V2)中的 81.6 top1-FF 实现之后的报告中一样。将 MHSA/FF的一层设置在最初的win-t的剩余分支没有出现。我们发现gradnorm在10纪元后(每次训练一个非常大),是否没有异常。最终训练都发散了。post-norm有额外的的设置吗?如果可以发布更多细节,将不胜诉。 > > > > 谢谢你能分享WIN2 > >...

Thank you for the reply. For the standard transformer block, the model (w/ 42M params) was trained on the iwslt14 de-en dataset and got 34.6 BLEU. For the DeLighT models,...

Yes, I set the architecture as `--arch delight_transformer_iwslt_de_en` for all exps, it's a typo in the last comment. Sure, I'll give it a try on WMT16 En-Ro dataset.

Sorry, the log file couldn't be copied from the server I used. I modified the script 'nmt_wmt16_en2ro.py' to train the model on iwslt14 de-en, TESTED_DIMS was set as [128, 256,...

I tried to rerun the exps with the same arguments as you set. LR was set as 0.001 and 0.005 and retrained the models on iwslt14' de-en. The following results...