Image-Super-Resolution-via-Iterative-Refinement icon indicating copy to clipboard operation
Image-Super-Resolution-via-Iterative-Refinement copied to clipboard

Support for upscaling 16-bit grayscale images

Open VladStojDev opened this issue 2 years ago • 12 comments

Hi,

I would like to upscale PNG images. The format for these images is 16-bits for 1 channel (single channel grayscale). I would ideally like to upscale them from 512x512 to 1024x1024.

I have created a custom JSON configuration file and set the number of input channels to 1, and also modified LRHR_dataset.py to load images in the "L" file format from Pillow, instead of the "RGB".

However, when I attempt training, I get this error:

" return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [64, 6, 3, 3], expected input[1, 2, 1024, 1024] to have 6 channels, but got 2 channels instead "

that gets invoked here:

"/sr3_modules/unet.py", line 244, in forward x = layer(x)"

I would like to know if you can point me in the right direction (or let me know what I need to modify in the code), to support training the GAN for upscaling 16-bit grayscale images in the PNG file format.

Thank you in advance!

VladStojDev avatar Sep 28 '22 09:09 VladStojDev

Sorry for this error.

  1. You should set the channel to 2 since the input contains [data, condition_data].
  2. That means have some setting need. e.g. https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/ef9b943b573328d7a5ddb1a0c2abd168b91610dc/config/sr_sr3_64_512.json#L77
  3. Make sure you changed the configure file effective.

Janspiry avatar Oct 03 '22 13:10 Janspiry

Thank you for your reply! I will try this and get back to you :)

VladStojDev avatar Oct 04 '22 08:10 VladStojDev

Thank you for your reply! I will try this and get back to you :)

How's it going? I'm trying grayscale images, too. But I got noisy images even with 300 epochs.

ElliotQi avatar Oct 15 '22 07:10 ElliotQi

Hey all,

Yes thank you for the input. So far I am not sure how well the model is working, as currently with 180k iterations in the experiments validation folder I still get _sr images that are completely noisy, but then other that are not - maybe it would be good if you could please briefly explain how the model is validated during training.

So to recap, I want to upscale 512x512 16-bit grayscale PNGs to 1024x1025 16-bit grayscale PNGs. This is my current config (redacted):

{ "name": "distributed_high_sr_test", "phase": "train", // train or val "gpu_ids": [ 0,1 ], "path": { //set the path "log": "logs", "tb_logger": "tb_logger", "results": "results", "checkpoint": "checkpoint", "resume_state": null // "resume_state": "experiments/distributed_high_sr_ffhq_210901_121212/checkpoint/I830000_E32" //pretrain model or training state }, "datasets": { "train": { "name": "FFHQ", "mode": "HR", // whether need LR img "dataroot": "redacted", "datatype": "img", //lmdb or img, path of img files "l_resolution": 512, // low resolution need to super_resolution "r_resolution": 1024, // high resolution "batch_size": 4, //Defualt is 2, try increasing this to 16, if you have enough VRAM "num_workers": 8, //default is 8 "use_shuffle": true, "data_len": -1 // -1 represents all data used in train }, "val": { "name": "CelebaHQ", "mode": "LRHR", "dataroot": "redacted", "datatype": "img", //lmdb or img, path of img files "l_resolution": 512, "r_resolution": 1024, "data_len": 577 } }, "model": { "which_model_G": "sr3", // use the ddpm or sr3 network structure "finetune_norm": false, "unet": { "in_channel": 2, //defualt is 6, use 2 for grayscale 16bit "out_channel": 1, //default is 3, use 1 for grayscale 16bit "inner_channel": 64, "norm_groups": 16, "channel_multiplier": [ 1, 2, 4, 8, //8, //By defualt this is commented out //16, //By default this is commented out 16 ], "attn_res": [ // 16 ], "res_blocks": 1, //default is 1, try changing this to 4 "dropout": 0 }, "beta_schedule": { // use manual beta_schedule for acceleration "train": { "schedule": "linear", "n_timestep": 2000, "linear_start": 1e-6, "linear_end": 1e-2 }, "val": { "schedule": "linear", "n_timestep": 2000, "linear_start": 1e-6, "linear_end": 1e-2 } }, "diffusion": { "image_size": 1024, "channels": 1, //sample channel - default is 3 "conditional": true // unconditional generation or unconditional generation(super_resolution) } }, "train": { "n_iter": 500001, //Default is 1000000, for general case use 100001 "val_freq": 10000, //Defualt is 10000 "save_checkpoint_freq": 10000, //Default is 1e4 "print_freq": 1, //default is 10 "optimizer": { "type": "adam", "lr": 3e-6 //Default is 3e-6 }, "ema_scheduler": { // not used now "step_start_ema": 5000, "update_ema_every": 1, "ema_decay": 0.9999 } }, "wandb": { "project": "distributed_high_sr_test" } }

Current validation results:

22-10-15 03:09:45.666 - INFO: <epoch: 2, iter: 10,000> psnr: 6.0010e+00 22-10-15 07:34:42.770 - INFO: <epoch: 3, iter: 20,000> psnr: 2.8180e+01 22-10-15 12:02:12.168 - INFO: <epoch: 4, iter: 30,000> psnr: 1.8110e+01 22-10-15 16:29:42.753 - INFO: <epoch: 5, iter: 40,000> psnr: 1.8095e+01 22-10-15 20:57:13.824 - INFO: <epoch: 6, iter: 50,000> psnr: 3.5393e+01 22-10-16 01:24:44.860 - INFO: <epoch: 7, iter: 60,000> psnr: 1.9494e+01 22-10-16 05:52:15.450 - INFO: <epoch: 8, iter: 70,000> psnr: 1.7423e+01 22-10-16 10:19:45.456 - INFO: <epoch: 9, iter: 80,000> psnr: 1.7837e+01 22-10-16 14:47:16.042 - INFO: <epoch: 10, iter: 90,000> psnr: 2.7558e+01 22-10-16 19:14:43.075 - INFO: <epoch: 11, iter: 100,000> psnr: 3.2787e+01 22-10-16 23:42:11.231 - INFO: <epoch: 12, iter: 110,000> psnr: 2.6647e+01 22-10-17 04:09:40.038 - INFO: <epoch: 13, iter: 120,000> psnr: 2.7375e+01 22-10-17 08:37:09.742 - INFO: <epoch: 14, iter: 130,000> psnr: 9.2383e+00 22-10-17 13:04:26.851 - INFO: <epoch: 15, iter: 140,000> psnr: 2.0560e+01 22-10-17 17:31:51.866 - INFO: <epoch: 16, iter: 150,000> psnr: 2.9518e+01 22-10-17 21:59:15.709 - INFO: <epoch: 17, iter: 160,000> psnr: 2.1418e+01 22-10-18 02:26:40.988 - INFO: <epoch: 19, iter: 170,000> psnr: 2.6690e+01 22-10-18 06:54:09.004 - INFO: <epoch: 20, iter: 180,000> psnr: 1.8943e+01

So any input/ideas/perspectives on how to get the highest resolution would be highly appreciated. My training data is approx. 55k images with a 70/30 split.

VladStojDev avatar Oct 18 '22 07:10 VladStojDev

@VladS-PCH
Hi, thanks for the feedback. I'm also working on medical images. As for your problems, I think I can solve some of them.

'I still get _sr images that are completely noisy, but then other that are not' According to your config file, you set 'need_LR' as False. So, the xxx_lr and xxx_inf are both interpolated low-resolution image with same dimension with high-resolution image. The xxx_hr is the ground truth. They are generated when you create the dataset, resulting in no noisy images. But for xxx_sr, it's the output of your model during validation. If they are still noisy images, I think the model has not great learning ability. I find the psnr score is highly related to the noisy image. I get the noisy image when the score is less than 12. Could you show the output of your model in epoch 11? It should be very close to the high-resolution image.

Here're some questions:

  1. Did you use some normalization for the 16bit gray-scale image?
  2. Did you choose some slices for training or put all slices into training?

ElliotQi avatar Oct 18 '22 11:10 ElliotQi

Hi,

Thanks for the input. Unfortunately, I cannot share any output data for this project as it is under strict NDA :(

One a side note, the xxx_inf is not low resolution, but it is upscaled to the resolution i want (1024x1024) using nearest-neighbour interpolation (as that's how I understood the training setup need to be). So to clarify, my training inputs are as follows:

xxx_lr: 512x512 16-bit single channel grayscale image xxx_hr: 1024x1024 16-bit single channel grayscale image (high-resolution ground truth images) xxx_inf: 1024x1024 16-bit singel channel grayscale image (cheap upscaling of the xxx_lr)

Is it maybe a problem that the xxx_inf is larger resolution than the xxx_lr?

I do not use any normalisation for the input training images.

I don't think I sue any slices for training, as I'm not quite sure what that is or how it is used - could you please elaborate further?

VladStojDev avatar Oct 18 '22 11:10 VladStojDev

Thanks for the answer. I'm also using a private dataset. That's a correct output for xxx_inf without any problem. I assume you're dealing with MRI or CT data, which has many slices in one image. So I'm wondering if you put all the slices for training or just choose some of them, e.g. (512, 512, 192) - (512, 512, 40) selected.

ElliotQi avatar Oct 18 '22 11:10 ElliotQi

Hi,

No the data is not medical data - it is in fact more close to GIS-related height-maps (single-channel grayscale height-map images), so there is no need for slicing (not dealing with any kind of volumes).

But yes, I am still training it to 200k iterations, should be done later on today, so I'll post the new logged PSNR values as reference.

My main concern is that the generated validation images (stored in the 'val' folder), which I generate every 10k iterations during training, show a sample of 3 different images, and for some iterations the xxx_sr image is almost OK quality (a bit blurry, but no noise), while other times the noise is very obvious (though in most cases it is 'fine-grained' noise).

I'm not really an expert with GANs and diffusion models, so I'm just trying to hack this as a black box in order to get the results I want :P

VladStojDev avatar Oct 18 '22 11:10 VladStojDev

Thanks! Wish you to get a good result!

ElliotQi avatar Oct 18 '22 12:10 ElliotQi

@VladS-PCH I am also working with some grayscale images. I tried making changes as indicated in your edited config file, but the code still shows the error below:

"input[64, 6, 3, 3], expected input[4, 2, 256, 256] to have 6 channels, but got 2 channels"

I have a different image size than you. Have you made any other changes in the code elsewhere? Thank you in advance.

vedant-gupta-ai avatar Apr 12 '23 07:04 vedant-gupta-ai

@VladS-PCH Hi, I came across the same problem of vedant-aero-ml, the first time I ran the model with greyscale images without changing the input channel of Unet, the code run successfully. Next time when I change input channel, output channel to 2,1, diffusion sample channel to 1, the code shows the error:

"weight of size [64, 2, 3, 3], expected input[6, 6, 128, 128] to have 2 channels, but got 6 channels instead"

So I would like to ask whether you made other change to the source code. Thank you!

whatifleave avatar Apr 19 '23 07:04 whatifleave

@VladS-PCH @vedant-aero-ml Hi, I have solved the problem, just change the file (path: Image-Super-Resolution-via-Iterative-Refinement-master/data/LRHR_dataset.py) line 83-91 delete .convert("RGB")

whatifleave avatar Apr 19 '23 08:04 whatifleave