NAFNet About the training speed

Thank you for your wonderful work. I use four blocks of 3090，and batch size per gpu is 4. It cost about 15 hours training NAFNet-width32 on SIDD, and about 3 days training NAFNet-width64. I wonder to know if this speed is normal？

Jun 02 '22 07:06 meitounao110

Hi, meitounao110, Thanks for your attention to NAFNet. This speed is normal. But it is recommended to keep the overall batch size consistent with our config (to reproduce our results), e.g. 32(4x8) in NAFNet-width32, and 64(8x8) in NAFNet-width64.

Jun 02 '22 09:06 mayorx

Hi, meitounao110, Thanks for your attention to NAFNet. This speed is normal. But it is recommended to keep the overall batch size consistent with our config (to reproduce our results), e.g. 32(4x8) in NAFNet-width32, and 64(8x8) in NAFNet-width64.

thank you very much! Another question is that what the role of $\gamma$ and $\beta$ in the NAFBlock? It seems not be mentioned in the paper.

Jun 07 '22 02:06 meitounao110

Hi, meitounao110, It is the skip-init we mentioned in our experimental section to stabilize training.

Jun 07 '22 09:06 mayorx

Hi, meitounao110, Thanks for your attention to NAFNet. This speed is normal. But it is recommended to keep the overall batch size consistent with our config (to reproduce our results), e.g. 32(4x8) in NAFNet-width32, and 64(8x8) in NAFNet-width64.

Hi, thanks for your excellent work! Can I ask a question about training NAFNet-width64 with REDS? I use 3 blocks of 3090 and other configs are the same as yours, and I train it for the same 400000 iterations. and PSNR=28.8342, ssim=0.8623, which is lower than that in the paper. Are there any other reasons that result in this situation? Or which iteration model you choose to get the best results as the paper shows(29.09, 0.867)? Thank you very much !

Jul 26 '22 05:07 dydxdt

Hi, thanks for your excellent work! Can I ask a question about training NAFNet-width64 with REDS? I use 3 blocks of 3090 and other configs are the same as yours, and I train it for the same 400000 iterations. and PSNR=28.8342, ssim=0.8623, which is lower than that in the paper. Are there any other reasons that result in this situation? Or which iteration model you choose to get the best results as the paper shows(29.09, 0.867)? Thank you very much !

refer to https://github.com/megvii-research/NAFNet/issues/24#issuecomment-1196259074

Jul 27 '22 04:07 mayorx

Thank you for your wonderful work. I use four blocks of 3090，and batch size per gpu is 4. It cost about 15 hours training NAFNet-width32 on SIDD, and about 3 days training NAFNet-width64. I wonder to know if this speed is normal？感谢您的出色工作。我使用了四个 3090 块，每个 gpu 的批处理大小为 4。在 SIDD 上训练 NAFNet-width32 大约需要 15 个小时，训练 NAFNet-width64 大约需要 3 天。我想知道这个速度是否正常？

你好，请问一下在24g显存的3090上实验的时候你的batchsize设置的是多少，是否有复现出论文中的指标

Nov 17 '23 17:11 easternblood

NAFNet NAFNet copied to clipboard

About the training speed

NAFNet
NAFNet copied to clipboard