Image-Super-Resolution-via-Iterative-Refinement
Image-Super-Resolution-via-Iterative-Refinement copied to clipboard
Support for upscaling 16-bit grayscale images
Hi,
I would like to upscale PNG images. The format for these images is 16-bits for 1 channel (single channel grayscale). I would ideally like to upscale them from 512x512 to 1024x1024.
I have created a custom JSON configuration file and set the number of input channels to 1, and also modified LRHR_dataset.py to load images in the "L" file format from Pillow, instead of the "RGB".
However, when I attempt training, I get this error:
" return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [64, 6, 3, 3], expected input[1, 2, 1024, 1024] to have 6 channels, but got 2 channels instead "
that gets invoked here:
"/sr3_modules/unet.py", line 244, in forward x = layer(x)"
I would like to know if you can point me in the right direction (or let me know what I need to modify in the code), to support training the GAN for upscaling 16-bit grayscale images in the PNG file format.
Thank you in advance!
Sorry for this error.
- You should set the channel to 2 since the input contains [data, condition_data].
- That means have some setting need. e.g. https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/ef9b943b573328d7a5ddb1a0c2abd168b91610dc/config/sr_sr3_64_512.json#L77
- Make sure you changed the configure file effective.
Thank you for your reply! I will try this and get back to you :)
Thank you for your reply! I will try this and get back to you :)
How's it going? I'm trying grayscale images, too. But I got noisy images even with 300 epochs.
Hey all,
Yes thank you for the input. So far I am not sure how well the model is working, as currently with 180k iterations in the experiments validation folder I still get _sr images that are completely noisy, but then other that are not - maybe it would be good if you could please briefly explain how the model is validated during training.
So to recap, I want to upscale 512x512 16-bit grayscale PNGs to 1024x1025 16-bit grayscale PNGs. This is my current config (redacted):
{ "name": "distributed_high_sr_test", "phase": "train", // train or val "gpu_ids": [ 0,1 ], "path": { //set the path "log": "logs", "tb_logger": "tb_logger", "results": "results", "checkpoint": "checkpoint", "resume_state": null // "resume_state": "experiments/distributed_high_sr_ffhq_210901_121212/checkpoint/I830000_E32" //pretrain model or training state }, "datasets": { "train": { "name": "FFHQ", "mode": "HR", // whether need LR img "dataroot": "redacted", "datatype": "img", //lmdb or img, path of img files "l_resolution": 512, // low resolution need to super_resolution "r_resolution": 1024, // high resolution "batch_size": 4, //Defualt is 2, try increasing this to 16, if you have enough VRAM "num_workers": 8, //default is 8 "use_shuffle": true, "data_len": -1 // -1 represents all data used in train }, "val": { "name": "CelebaHQ", "mode": "LRHR", "dataroot": "redacted", "datatype": "img", //lmdb or img, path of img files "l_resolution": 512, "r_resolution": 1024, "data_len": 577 } }, "model": { "which_model_G": "sr3", // use the ddpm or sr3 network structure "finetune_norm": false, "unet": { "in_channel": 2, //defualt is 6, use 2 for grayscale 16bit "out_channel": 1, //default is 3, use 1 for grayscale 16bit "inner_channel": 64, "norm_groups": 16, "channel_multiplier": [ 1, 2, 4, 8, //8, //By defualt this is commented out //16, //By default this is commented out 16 ], "attn_res": [ // 16 ], "res_blocks": 1, //default is 1, try changing this to 4 "dropout": 0 }, "beta_schedule": { // use manual beta_schedule for acceleration "train": { "schedule": "linear", "n_timestep": 2000, "linear_start": 1e-6, "linear_end": 1e-2 }, "val": { "schedule": "linear", "n_timestep": 2000, "linear_start": 1e-6, "linear_end": 1e-2 } }, "diffusion": { "image_size": 1024, "channels": 1, //sample channel - default is 3 "conditional": true // unconditional generation or unconditional generation(super_resolution) } }, "train": { "n_iter": 500001, //Default is 1000000, for general case use 100001 "val_freq": 10000, //Defualt is 10000 "save_checkpoint_freq": 10000, //Default is 1e4 "print_freq": 1, //default is 10 "optimizer": { "type": "adam", "lr": 3e-6 //Default is 3e-6 }, "ema_scheduler": { // not used now "step_start_ema": 5000, "update_ema_every": 1, "ema_decay": 0.9999 } }, "wandb": { "project": "distributed_high_sr_test" } }
Current validation results:
22-10-15 03:09:45.666 - INFO: <epoch: 2, iter: 10,000> psnr: 6.0010e+00 22-10-15 07:34:42.770 - INFO: <epoch: 3, iter: 20,000> psnr: 2.8180e+01 22-10-15 12:02:12.168 - INFO: <epoch: 4, iter: 30,000> psnr: 1.8110e+01 22-10-15 16:29:42.753 - INFO: <epoch: 5, iter: 40,000> psnr: 1.8095e+01 22-10-15 20:57:13.824 - INFO: <epoch: 6, iter: 50,000> psnr: 3.5393e+01 22-10-16 01:24:44.860 - INFO: <epoch: 7, iter: 60,000> psnr: 1.9494e+01 22-10-16 05:52:15.450 - INFO: <epoch: 8, iter: 70,000> psnr: 1.7423e+01 22-10-16 10:19:45.456 - INFO: <epoch: 9, iter: 80,000> psnr: 1.7837e+01 22-10-16 14:47:16.042 - INFO: <epoch: 10, iter: 90,000> psnr: 2.7558e+01 22-10-16 19:14:43.075 - INFO: <epoch: 11, iter: 100,000> psnr: 3.2787e+01 22-10-16 23:42:11.231 - INFO: <epoch: 12, iter: 110,000> psnr: 2.6647e+01 22-10-17 04:09:40.038 - INFO: <epoch: 13, iter: 120,000> psnr: 2.7375e+01 22-10-17 08:37:09.742 - INFO: <epoch: 14, iter: 130,000> psnr: 9.2383e+00 22-10-17 13:04:26.851 - INFO: <epoch: 15, iter: 140,000> psnr: 2.0560e+01 22-10-17 17:31:51.866 - INFO: <epoch: 16, iter: 150,000> psnr: 2.9518e+01 22-10-17 21:59:15.709 - INFO: <epoch: 17, iter: 160,000> psnr: 2.1418e+01 22-10-18 02:26:40.988 - INFO: <epoch: 19, iter: 170,000> psnr: 2.6690e+01 22-10-18 06:54:09.004 - INFO: <epoch: 20, iter: 180,000> psnr: 1.8943e+01
So any input/ideas/perspectives on how to get the highest resolution would be highly appreciated. My training data is approx. 55k images with a 70/30 split.
@VladS-PCH
Hi, thanks for the feedback. I'm also working on medical images. As for your problems, I think I can solve some of them.
'I still get _sr images that are completely noisy, but then other that are not' According to your config file, you set 'need_LR' as False. So, the xxx_lr and xxx_inf are both interpolated low-resolution image with same dimension with high-resolution image. The xxx_hr is the ground truth. They are generated when you create the dataset, resulting in no noisy images. But for xxx_sr, it's the output of your model during validation. If they are still noisy images, I think the model has not great learning ability. I find the psnr score is highly related to the noisy image. I get the noisy image when the score is less than 12. Could you show the output of your model in epoch 11? It should be very close to the high-resolution image.
Here're some questions:
- Did you use some normalization for the 16bit gray-scale image?
- Did you choose some slices for training or put all slices into training?
Hi,
Thanks for the input. Unfortunately, I cannot share any output data for this project as it is under strict NDA :(
One a side note, the xxx_inf is not low resolution, but it is upscaled to the resolution i want (1024x1024) using nearest-neighbour interpolation (as that's how I understood the training setup need to be). So to clarify, my training inputs are as follows:
xxx_lr: 512x512 16-bit single channel grayscale image xxx_hr: 1024x1024 16-bit single channel grayscale image (high-resolution ground truth images) xxx_inf: 1024x1024 16-bit singel channel grayscale image (cheap upscaling of the xxx_lr)
Is it maybe a problem that the xxx_inf is larger resolution than the xxx_lr?
I do not use any normalisation for the input training images.
I don't think I sue any slices for training, as I'm not quite sure what that is or how it is used - could you please elaborate further?
Thanks for the answer. I'm also using a private dataset. That's a correct output for xxx_inf without any problem. I assume you're dealing with MRI or CT data, which has many slices in one image. So I'm wondering if you put all the slices for training or just choose some of them, e.g. (512, 512, 192) - (512, 512, 40) selected.
Hi,
No the data is not medical data - it is in fact more close to GIS-related height-maps (single-channel grayscale height-map images), so there is no need for slicing (not dealing with any kind of volumes).
But yes, I am still training it to 200k iterations, should be done later on today, so I'll post the new logged PSNR values as reference.
My main concern is that the generated validation images (stored in the 'val' folder), which I generate every 10k iterations during training, show a sample of 3 different images, and for some iterations the xxx_sr image is almost OK quality (a bit blurry, but no noise), while other times the noise is very obvious (though in most cases it is 'fine-grained' noise).
I'm not really an expert with GANs and diffusion models, so I'm just trying to hack this as a black box in order to get the results I want :P
Thanks! Wish you to get a good result!
@VladS-PCH I am also working with some grayscale images. I tried making changes as indicated in your edited config file, but the code still shows the error below:
"input[64, 6, 3, 3], expected input[4, 2, 256, 256] to have 6 channels, but got 2 channels"
I have a different image size than you. Have you made any other changes in the code elsewhere? Thank you in advance.
@VladS-PCH Hi, I came across the same problem of vedant-aero-ml, the first time I ran the model with greyscale images without changing the input channel of Unet, the code run successfully. Next time when I change input channel, output channel to 2,1, diffusion sample channel to 1, the code shows the error:
"weight of size [64, 2, 3, 3], expected input[6, 6, 128, 128] to have 2 channels, but got 6 channels instead"
So I would like to ask whether you made other change to the source code. Thank you!
@VladS-PCH @vedant-aero-ml Hi, I have solved the problem, just change the file (path: Image-Super-Resolution-via-Iterative-Refinement-master/data/LRHR_dataset.py) line 83-91 delete .convert("RGB")