Very low PSNR
I have been following your works on SR and thank you for such a super fast architecture !!
I trained CCSR-v2 on a dataset that already contains the LR-HR pairs.
So I removed the degradation process of RealESR-GAN to generate LR and rewrote the dataloader to fetch the LR-HR pairs from the disk.
My dataloader looks like this:
class CustomPairedDataset(data.Dataset):
def __init__(
self,
gt_file=None, # Path to the file containing GT image paths
lq_file=None, # Path to the file containing LQ image paths
tokenizer=None,
gt_ratio=0, # Probability to use GT as LQ
):
super(CustomPairedDataset, self).__init__()
self.gt_ratio = gt_ratio
# Load GT and LQ image paths from text files
with open(gt_file, 'r') as f:
self.gt_list = [line.strip() for line in f.readlines()]
with open(lq_file, 'r') as f:
self.lq_list = [line.strip() for line in f.readlines()]
# Ensure GT and LQ lists have the same length
assert len(self.gt_list) == len(self.lq_list), "GT and LQ lists must have the same length"
# Image preprocessing pipeline
self.img_preproc = transforms.Compose([
#transforms.RandomCrop((512, 512)),
#transforms.Resize((512, 512)),
#transforms.RandomHorizontalFlip(),
])
self.tokenizer = tokenizer
def tokenize_caption(self, caption=""):
inputs = self.tokenizer(
caption, max_length=self.tokenizer.model_max_length, padding="max_length", truncation=True, return_tensors="pt"
)
return inputs.input_ids
def __getitem__(self, index):
# Load GT and LQ images
gt_path = self.gt_list[index]
lq_path = self.lq_list[index]
gt_img = Image.open(gt_path).convert('RGB')
lq_img = Image.open(lq_path).convert('RGB')
# Apply preprocessing to both GT and LQ images
gt_img = self.img_preproc(gt_img)
lq_img = self.img_preproc(lq_img)
# Convert images to tensors and normalize
gt_img = transforms.ToTensor()(gt_img) # [0, 1]
lq_img = transforms.ToTensor()(lq_img) # [0, 1]
# No caption used
lq_caption = ''
# Prepare the output dictionary
example = dict()
example["conditioning_pixel_values"] = lq_img.squeeze(0) # [0, 1]
example["pixel_values"] = gt_img.squeeze(0) * 2.0 - 1.0 # [-1, 1]
example["input_caption"] = self.tokenize_caption(caption=lq_caption).squeeze(0)
return example
def __len__(self):
return len(self.gt_list)
The rest of the things used for training are same as default.
The stage1 training converged very quickly < 200 epochs:
The stage2 training with 'num_inference_steps=1' looks like this:
But the PSNR & SSIM are very low on test data:
My LR-HR pairs (512x512) looks like this:
,
How to improve the performance ?
Does the text prompts for tokenize_caption() make a huge difference on microscopic images like this?
Also during inference, when I set --added_prompt=None and --negative_prompt=None . But that does not make any difference and psnr remains the same.
Due to resource constrains I can only run a single step on my GPU.
Thanks for your question. The weights of GAN loss lambda_disc and LPIPS loss lambda_lpips in stage 2 can be reduced for a better PSNR value.
Yes but for a single step these parameters have no real effect !
So no improvements.
If I set the VAE & UNet to trainable it's not showing any improvements too !
or should I train the VAE & UNet seperately & then include them in the CCSR & train the stage1 & 2 again?
Training VAE separately, and then putting it back into the CCSR improved the performance a little bit .
I think if I want to use single step, the performance gains a very minimal.