SwinIR
SwinIR copied to clipboard
Can't replicate training with the L1 loss - Real-world SR task
Hi J., thanks for the awesome work and for sharing the code.
I have retrained your model for the real-world SR task from scratch with only L1 loss and I noticed a substantial difference in the results wrt the pretrained L1 model. In fact, the images produced by the pretrained L1 model look sharper and have a slight, brighter color shift (see images below). The only change I made to your code is the batch size (16 instead of 32, kept the same lr 2e-4), due to memory limitations on my machine. Do you think that could be the issue? Anything else that comes up to your mind? Out of curiosity, what was your training time on 8 RTX 2080 Ti? Mine was 1.1s per iter (~13 days for 1000k iter) on 4 T4 for the L1 training, and 1.9s per iter for the GAN loss training (~13 days for 600k iter).
Thank you!
Output of pretrained model L1, trained for 1000k iterations
Output of my model L1, trained for 970k iterations
Output of pretrained model L1, trained for 1000k iterations
Output of my model L1, trained for 970k iterations
I was wondering the same...
In the paper:
For real-world image SR, we use a combination of pixel loss, GAN loss and perceptual loss to improve visual quality.
Current training code (as found in KAIR) is training with L1 loss alone.
I suspect that could explain the difference you're observing. It is not clear what "combination" means here exactly. Maybe the authors will clarify this ;-)
1, It's strange that your L1 loss trained model performs bad. I currently have no idea why it happens. Is the loss normal?
2, If you meet I/O bottleneck, preparing the dataset as lmdb (https://github.com/cszn/KAIR/blob/master/scripts/data_preparation/create_lmdb.py) or extracting them as small patches (https://github.com/cszn/KAIR/blob/master/scripts/data_preparation/extract_subimages.py) can help a lot. The PSNR performance will the same.
3, There are two stages. You should first train with L1 loss and then train it with a combination of L1 loss, GAN loss and perceptual loss.