StableSR
StableSR copied to clipboard
The parameter setting for dec_w when obtaining cfw training data.
I have tested the model you provided and the results are very impressive. I would like to train the model with my own data and I have a question to ask.When using this script(scripts/sr_val_ddpm_text_T_vqganfin_old.py)to obtain training data(latent and sample) for cfw ,what value should the parameter dec_w (vq_model.decoder.fusion_w) be set to? Should it be set to 0, 1, or another value? Thank you! (https://github.com/IceClear/StableSR/blob/e820f7a3766c8a4184f4f62a5bae7a5def645b25/scripts/sr_val_ddpm_text_T_vqganfin_old.py#L298C6-L298C16)
Hi~thanks for your interest of our work. For data generation, we do not have CFW so we actually set w=0. During training, you just need to set w=1.
您好~感谢您对我们工作训练的兴趣。 对于数据生成,我们没有CFW,所以我们实际上设置w=0。 此时,只需设置w=1即可。
Okay, thank you very much for your response! To ensure better experimental results, I have three more detailed questions to ask. 1、The first question is about training SFT. When training SFT, are the ground truth (gts) randomly cropped from the original HR images with a size of either 512x512 or 768x768 pixels (depending on the base model), and these original HR images without any modifications, such as resizing the photo? 2、The second question is, for the high-resolution facial data used in training, is the same procedure followed as mentioned in the first point? I'm concerned that incomplete face cropping(Randomly cropped images may result in incomplete face) might affect the experimental results, so I want to confirm this. 3、The third question is, when training CFW, we need to obtain gts and inputs. I have two guesses regarding their acquisition. The first guess is that gts are first cropped from HR images (so multiple gts images can be cropped from one HR image), and then degraded LQ images are obtained from the gts images as inputs. In this case, the inputs corresponding to one HR image are obtained with different degradation parameters. The second guess is that HR images are first degraded to obtain LQ images, and then gts and inputs are separately obtained from HR and LQ images, respectively. In this case, the inputs corresponding to one HR image are sampled with the same degradation parameters. I wonder which one is the actual practice, and if there are already complete scripts available for this part? I'm looking forward to your response.