Where do hyperparameters of RealESRGAN degradation come from?
Hi, and thank you for the awesome work!
Upon inspecting the provided configs, I've noticed there are differences in the degradation hyperparams in https://github.com/IceClear/StableSR/blob/main/configs/stableSRNew/v2-finetune_text_T_512.yaml compared with other implementations.
For example, the first degradation is written with the following config:
degradation:
# the first degradation process
resize_prob: [0.2, 0.7, 0.1] # up, down, keep
resize_range: [0.3, 1.5]
gaussian_noise_prob: 0.5
noise_range: [1, 15]
poisson_scale_range: [0.05, 2.0]
gray_noise_prob: 0.4
jpeg_range: [60, 95]
while in https://github.com/xinntao/Real-ESRGAN/blob/master/options/finetune_realesrgan_x4plus.yml the config defines:
# the first degradation process
resize_prob: [0.2, 0.7, 0.1] # up, down, keep
resize_range: [0.15, 1.5]
gaussian_noise_prob: 0.5
noise_range: [1, 30]
poisson_scale_range: [0.05, 3]
gray_noise_prob: 0.4
jpeg_range: [30, 95]
Are there any reasons for such discrepancies? I haven't found this anywhere in the paper. Also, when you measure metrics in table 1, do you create LR images with first or second degradation settings?
Thanks in advance!
There are even more differences in the second degradation process - in the provided config it is
# the second degradation process
second_blur_prob: 0.5
resize_prob2: [0.3, 0.4, 0.3] # up, down, keep
resize_range2: [0.6, 1.2]
gaussian_noise_prob2: 0.5
noise_range2: [1, 12]
poisson_scale_range2: [0.05, 1.0]
gray_noise_prob2: 0.4
jpeg_range2: [60, 100]
while in the RealESRGAN codebase it is
# the second degradation process
second_blur_prob: 0.8
resize_prob2: [0.3, 0.4, 0.3] # up, down, keep
resize_range2: [0.3, 1.2]
gaussian_noise_prob2: 0.5
noise_range2: [1, 25]
poisson_scale_range2: [0.05, 2.5]
gray_noise_prob2: 0.4
jpeg_range2: [30, 95]
basicsr repo also uses the latter set of hparams. I checked CodeFormer repo but there seems to be no realesrgan degradations there, so the settings are probably not from there. I also thought this may be reusing BSRGAN params but the jpeg_range there seems to be either [30, 95] (again following second set of hparams) or [80, 95] as introduced in BSRGAN-light degradation in Latent Diffusion Models. I haven't found a range [60, 100] anywhere(
We adjust the degradation settings slightly to avoid the huge gap between the synthetic data and real-world data. We use our settings for synthetic test. BSRGAN-light is also adopted for a similar purpose in LDM.
Is there any specific protocol you follow to adjust those settings? Or is this based on a visual inspection of images?
Also, have you trained RealESRGAN on your custom degradation settings? It seems like original RealESRGAN w/o retraining would perform slightly worse bc you now apply less degradation that was intended for it.
Based on our own visualization. After all, our focus is on real-world applications. We did not fine-tune it due to the time limitation. You may have a try if you would like to. We are not sure if our current settings may improve the performance of RealESRGAN or not. We just intuitively consider that too heavy degradation may not be necessary for real-world cases. BTW, there is no standard degradation pipeline now, even BSRGAN and RealESRGAN have different degradation pipelines and previous papers usually do not fine-tune baselines due to the huge workload. We do not think the slight changes of the degradations is the main reason of the visualization gap between StableSR and RealESRGAN.
Okay, thanks for your clarifications! I'll try fine-tuning RealESRGAN with the same train data and degradation as yours, and report here if there are significant differences.
Okay, thanks for your clarifications! I'll try fine-tuning RealESRGAN with the same train data and degradation as yours, and report here if there are significant differences.
Thanks for your help. Looking forward to your results :)