image-super-resolution icon indicating copy to clipboard operation
image-super-resolution copied to clipboard

Questions on images used in SR training

Open ontheway16 opened this issue 6 years ago • 5 comments

Hi, I have some questions in mind for a while and want to discuss possibilities. Lr images for training are obtained by cubic rescale of hr ones, and that's fine. This process is preserving the exact noise found in hr image also, don't know if this is an additional benefit or not. I am taking photos of small technical pieces for documentation purposes, so my photography environment conditions are fixed (same light levels, same background, same object distance, etc). I have a couple of unusual options in mind, to obtain lr and hr image sets, under given conditions.

One is, taking two photos of same objects, at two different resolutions directly set from the digital camera. One is for lr, and the other one is for hr.

Other option is a bit more complicated and will require more work. No resolution change this time, but a different optical zoom via lens is used to get wide (lr) and zoomed-in (hr) images. Especially with this one, it's possible to capture seriously rich detail levels in hr images.

Do you think any of these helps the SR to perform better, by means of detail preservation and noise, in final images?

ontheway16 avatar Apr 03 '19 11:04 ontheway16

Representing more accurately the low resolution distribution would definitely have some positive effect, the magnitude of which can be determined with some experiment.

It is of course crucial that the two images, LR and HR, are representing exactly the same object under the exact same conditions, especially for a PSNR-driven training.

For this reason the first solution seems the more feasible approach (I'm assuming the picture would be taken after one another, without any manual intervention). I would be very interested in seeing the results (or trying out such dataset, if you make it available). Also this basically would mean that ISR would strictly simulate a 'camera upgrade', pretty cool.

My concern with the second approach is that it would require both manual operations and some post cropping operation of the LR image, right? Which from my non-expert perspective is very prone to mistakes. Of course, should these not be issues, it could be very interesting..

cfrancesco avatar Apr 03 '19 12:04 cfrancesco

I have ability to take pictures one after another, within seconds (or less, if I setup a macro script), since I am controlling the camera from computer. So, as you stated, these two photos will be highly identical, except the noise distribution. The other solution is, while it has potential to deliver much more details, changing zoom is bringing different levels of lens barrel distortion per image, which is one thing. Other is, zooming in via optics results with different illumination levels, further zoom means darker frame, so they will have different brightness levels.

I don't know if something can be done using image alignment/warping software to match the patch in hr. It seems it will be difficult to reach pixel-level equality, at least in some portions of the images. In this point, I am also wondering how many images are needed to carry out such a test? Also, I am talking about 16mp lr, and 64mp hr frames, some resizing will be needed prior to training I guess. I am limited to 11gb gpu ram here. Since I am not a programmer, I know very little about cloud etc solutions.

ontheway16 avatar Apr 03 '19 14:04 ontheway16

Resizing should never be done. Rather, crops of the images should be taken, but this is done automatically in ISR (the 'patch_size' parameter in config.yml (under ['session']['training']) determines the size of the crops). As for the number, the more the better, but for instance the dataset I used for training, the DIV2K dataset, has only 800 images in it. Ideally the different resolutions you use should lead to image sizes of nxn, 2nx2n, 3nx3n, 4nx4n. Feel free to reach me via email for more detail.

I would not be surprised if the brightness level becomes a problem. But it could turn out to be desirable. Most likely the lens distortion would cause problems. I suggest you to work on the dataset using your first idea, which seems ideal and much easier to achieve.

I would be very excited, if and when you actually create this dataset, to try training some models on it!

cfrancesco avatar Apr 03 '19 16:04 cfrancesco

Perfect, then I will prepare a sample dataset for the purpose. Since my application contains only highly similar objects, on a single-color background (slight noise unavoidable of course), may I assume a smaller number of images will be enough for a successful test? If yes, is there an estimate on minimal num. of images required? (Just saw your edit, then I will be busy with preparing around 500 images dataset for both res., I will contact you via email).

ontheway16 avatar Apr 03 '19 16:04 ontheway16

It would be interesting if the images or results of those experiments could be shared to optimize the model :)

victorca25 avatar Apr 04 '19 09:04 victorca25