Question about blurred result after reproduction of the training code
We have successfully reproduced the provided training code. However, we observed that the output images are noticeably blurry and lack the expected level of detail and sharpness. This issue persists even though we strictly followed the described setup and preprocessing procedures.
We would greatly appreciate any insights or recommendations on how to improve the visual quality of the results. Specifically, we are interested in potential refinements such as tuning model parameters, adjusting the loss function weights, or modifying data normalization strategies that could help enhance image clarity.
For your reference, we have included the relevant portion of our training code below:
def compute_frequency_loss(pred, gt, criterion):
gt_fft = torch.fft.fft2(gt, dim=(-2, -1))
gt_fft = torch.stack((gt_fft.real, gt_fft.imag), dim=-1)
pred_fft = torch.fft.fft2(pred, dim=(-2, -1))
pred_fft = torch.stack((pred_fft.real, pred_fft.imag), dim=-1)
return criterion(pred_fft, gt_fft)
def train_one_epoch(loader, model, optimizer, criterion, inp_sub, inp_div, gt_sub, gt_div):
model.train()
epoch_loss = 0.0
pbar = tqdm(loader, desc='train')
for batch in pbar:
# Move data to GPU (scale is a scalar so it doesn't need to be moved)
for k, v in batch.items():
if k != 'scale':
batch[k] = v.cuda()
optimizer.zero_grad()
inp = (batch['inp'] - inp_sub) / inp_div
gt = (batch['gt'] - gt_sub) / gt_div
pred = model(inp, batch['scale'])
l1_loss_val = criterion(pred, gt)
freq_loss_val = compute_frequency_loss(pred, gt, criterion)
total_loss = l1_loss_val + 0.001 * freq_loss_val
total_loss.backward()
optimizer.step()
epoch_loss += total_loss.item()
pbar.set_description('Loss: {:.4f}'.format(total_loss.item()))
# Delete some intermediate variables to free GPU memory
del inp, gt, pred, l1_loss_val, freq_loss_val, total_loss
torch.cuda.empty_cache()
return epoch_loss / len(loader)
Thank you for your interest in our work! Did you load the pre-trained model for the backbone during the training process? We found in our experiments that whether to load a pre-trained model can have a certain impact on the results because pre-trained models can provide better initialization. Maybe you can try loading the pre-trained model for the backbone: HAT-L_SRx4_ImageNet-pretrain.pth for HAT or 001_classicalSR_DF2K_s64w8_SwinIR-M_x4.pth for SwinIR. This approach might enhance the visual quality of the results.
Thank you for your prompt response. We did not import the pre-trained backbone models (e.g., HAT-L_SRx4_ImageNet-pretrain.pth for HAT or 001_classicalSR_DF2K_s64w8_SwinIR-M_x4.pth for SwinIR). Instead, we directly loaded the provided trained weights for finetuning. We would like to ask if there are any important tricks or recommendations we might have overlooked during this process.
Did you obtain the LR data from the GT using the bicubic method? If there are other types of degradations, such as blur, using our pre-trained model may not yield good super-resolution results. This is because our model weights were trained on bicubic paired data, and during testing, LR images with only bicubic degradation should be used. I hope this answers your question.