CCSR Very low PSNR

I have been following your works on SR and thank you for such a super fast architecture !!

I trained CCSR-v2 on a dataset that already contains the LR-HR pairs.

So I removed the degradation process of RealESR-GAN to generate LR and rewrote the dataloader to fetch the LR-HR pairs from the disk.

My dataloader looks like this:

class CustomPairedDataset(data.Dataset):
    def __init__(
            self,
            gt_file=None,  # Path to the file containing GT image paths
            lq_file=None,  # Path to the file containing LQ image paths
            tokenizer=None,
            gt_ratio=0,  # Probability to use GT as LQ
    ):
        super(CustomPairedDataset, self).__init__()

        self.gt_ratio = gt_ratio

        # Load GT and LQ image paths from text files
        with open(gt_file, 'r') as f:
            self.gt_list = [line.strip() for line in f.readlines()]
        with open(lq_file, 'r') as f:
            self.lq_list = [line.strip() for line in f.readlines()]

        # Ensure GT and LQ lists have the same length
        assert len(self.gt_list) == len(self.lq_list), "GT and LQ lists must have the same length"

        # Image preprocessing pipeline
        self.img_preproc = transforms.Compose([
            #transforms.RandomCrop((512, 512)),
            #transforms.Resize((512, 512)),
            #transforms.RandomHorizontalFlip(),
        ])

        self.tokenizer = tokenizer

    def tokenize_caption(self, caption=""):
        inputs = self.tokenizer(
            caption, max_length=self.tokenizer.model_max_length, padding="max_length", truncation=True, return_tensors="pt"
        )
        return inputs.input_ids

    def __getitem__(self, index):
        # Load GT and LQ images
        gt_path = self.gt_list[index]
        lq_path = self.lq_list[index]

        gt_img = Image.open(gt_path).convert('RGB')
        lq_img = Image.open(lq_path).convert('RGB')

        # Apply preprocessing to both GT and LQ images
        gt_img = self.img_preproc(gt_img)
        lq_img = self.img_preproc(lq_img)

        # Convert images to tensors and normalize
        gt_img = transforms.ToTensor()(gt_img)  # [0, 1]
        lq_img = transforms.ToTensor()(lq_img)  # [0, 1]

        # No caption used
        lq_caption = ''

        # Prepare the output dictionary
        example = dict()
        example["conditioning_pixel_values"] = lq_img.squeeze(0)  # [0, 1]
        example["pixel_values"] = gt_img.squeeze(0) * 2.0 - 1.0  # [-1, 1]
        example["input_caption"] = self.tokenize_caption(caption=lq_caption).squeeze(0)


        return example

    def __len__(self):
        return len(self.gt_list)

The rest of the things used for training are same as default.

The stage1 training converged very quickly < 200 epochs:

The stage2 training with 'num_inference_steps=1' looks like this:

But the PSNR & SSIM are very low on test data:

My LR-HR pairs (512x512) looks like this:

,

How to improve the performance ?

Does the text prompts for tokenize_caption() make a huge difference on microscopic images like this?

Also during inference, when I set --added_prompt=None and --negative_prompt=None . But that does not make any difference and psnr remains the same.

Due to resource constrains I can only run a single step on my GPU.

Feb 26 '25 09:02 ManuBN786

Thanks for your question. The weights of GAN loss lambda_disc and LPIPS loss lambda_lpips in stage 2 can be reduced for a better PSNR value.

Mar 25 '25 11:03 csslc

Yes but for a single step these parameters have no real effect !

So no improvements.

If I set the VAE & UNet to trainable it's not showing any improvements too !

or should I train the VAE & UNet seperately & then include them in the CCSR & train the stage1 & 2 again?

Apr 17 '25 07:04 ManuBN786

Training VAE separately, and then putting it back into the CCSR improved the performance a little bit .

I think if I want to use single step, the performance gains a very minimal.

Aug 04 '25 11:08 ManuBN786