SwinIR Can't replicate training with the L1 loss

Hi J., thanks for the awesome work and for sharing the code.

I have retrained your model for the real-world SR task from scratch with only L1 loss and I noticed a substantial difference in the results wrt the pretrained L1 model. In fact, the images produced by the pretrained L1 model look sharper and have a slight, brighter color shift (see images below). The only change I made to your code is the batch size (16 instead of 32, kept the same lr 2e-4), due to memory limitations on my machine. Do you think that could be the issue? Anything else that comes up to your mind? Out of curiosity, what was your training time on 8 RTX 2080 Ti? Mine was 1.1s per iter (~13 days for 1000k iter) on 4 T4 for the L1 training, and 1.9s per iter for the GAN loss training (~13 days for 600k iter).

Thank you!

Output of pretrained model L1, trained for 1000k iterations OST_009_225_225_120_003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_PSNR Output of my model L1, trained for 970k iterations OST_009_225_225_120_swinir_sr_realworld_x4_psnr_bs_16_975000_G Output of pretrained model L1, trained for 1000k iterations Pattern2_003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_PSNR Output of my model L1, trained for 970k iterations Pattern2_swinir_sr_realworld_x4_psnr_bs_16_975000_G

Mar 08 '22 14:03 andreapesare

I was wondering the same...

In the paper:

For real-world image SR, we use a combination of pixel loss, GAN loss and perceptual loss to improve visual quality.

Current training code (as found in KAIR) is training with L1 loss alone.

I suspect that could explain the difference you're observing. It is not clear what "combination" means here exactly. Maybe the authors will clarify this ;-)

Apr 19 '22 09:04 0asa

1, It's strange that your L1 loss trained model performs bad. I currently have no idea why it happens. Is the loss normal?

2, If you meet I/O bottleneck, preparing the dataset as lmdb (https://github.com/cszn/KAIR/blob/master/scripts/data_preparation/create_lmdb.py) or extracting them as small patches (https://github.com/cszn/KAIR/blob/master/scripts/data_preparation/extract_subimages.py) can help a lot. The PSNR performance will the same.

3, There are two stages. You should first train with L1 loss and then train it with a combination of L1 loss, GAN loss and perceptual loss.

Apr 27 '22 06:04 JingyunLiang

SwinIR
SwinIR copied to clipboard

Can't replicate training with the L1 loss - Real-world SR task

SwinIR SwinIR copied to clipboard

Can't replicate training with the L1 loss - Real-world SR task

SwinIR
SwinIR copied to clipboard