pytorch-LapSRN icon indicating copy to clipboard operation
pytorch-LapSRN copied to clipboard

Question about data augmentation

Open yuanshuai220 opened this issue 6 years ago • 13 comments

Thanks for your code, it helps me a lot. But I have some questions about data augmentation. In the generate_train_lap_pry.m, you only used downsizing to make more training data. While in the paper, the author augments the training data in three ways, scaling, rotation and flipping. Your performance is better than the paper, but your training data only has 7488 examples. I'm confused about it.

yuanshuai220 avatar Sep 28 '17 08:09 yuanshuai220

@yuanshuai220 Hi, I am reproducing the paper result recently. The training data provided is a tiny sample, you can collect the BSD200, T91 and general100 total 391 images as your training dataset using generate_train_lap_pry.m. I get the training datasets size of (11712, 1, 32, 32). After 200 epochs, I get average psnr 31.32 on Set5 for 4X. After several test, I find that the training datasets play a important roles in resluts. The more richer training datasets is, the better result you will get. Meanwhile data augmentation is also important, you can add scaling, rotation and flipping function in generate_train_lap_pry.m script by yourself.

ZhangDY827 avatar Sep 28 '17 08:09 ZhangDY827

@CasdDesnDR I agree with you. If the trianing data is not enough, the nerual network will overfit with the training set. So the performance on test set is not good. I will add rotation and flipping in the generate_train_lap_pry.m

yuanshuai220 avatar Sep 28 '17 14:09 yuanshuai220

@yuanshuai220 @CasdDesnDR Please refer https://github.com/twtygqyy/pytorch-SRResNet/blob/master/data/generate_train_srresnet.m for adding flipping and rotation

twtygqyy avatar Sep 28 '17 14:09 twtygqyy

@twtygqyy Hi, Thank you for sharing your code. I want to know why you convert RGB images into YCbCr colour space and only use the Y channel information. How about the results directly using all RGB channels?

baiyancheng20 avatar Oct 13 '17 06:10 baiyancheng20

Hi @baiyancheng20, I followed the LapSRN paper for the implementation. Actually, you can check https://github.com/twtygqyy/pytorch-SRResNet which I used RGB image as inputs.

twtygqyy avatar Oct 14 '17 15:10 twtygqyy

@twtygqyy Hi, Thank you for sharing your LAPSRN code. I took your pytorch code from Git-hub and executed it. It works only for grayscale images. I modified the lapsrn.py to extend support for RGB color images. Then I took just one color image from Urban100 dataset. On this image, I performed augmentations as given in your matlab code (generate_train_lap_pry.m). I got around 165 color image patches of size 128x128. Using these image patches (h5 file ) I trained the network for 100 epochs. For testing I modified the test.py for color images, and gave as input, a 32x32 cropped image from the original training image. The results are very poor. I'm not sure where I'm going wrong. I have attached modified codes and my results. I request that I may be kindly given technical advise as to how I can proceed further to get the correct results.

sourcefiles.zip

sriprabhar avatar Oct 12 '18 07:10 sriprabhar

@sriprabhar Hi, I understand that you tried to overfit the network on a small dataset. How is the loss looks like in your training? Did it converge well?

twtygqyy avatar Oct 12 '18 16:10 twtygqyy

Thanks for your response. I took the building image (attached) and took several overlapping patches. Training trial 1

stride = 64 and number of patched = 15x11 patches of size 128x128. Convergence noticed. Attached Plot no 1. Trained for 100 epochs

Training trial 2

stride = 16 and number of patches were 41x57, each of size 128x128. Convergence noticed. Attached Plot no 2. Trained for 5 epochs using training trial 1 model as pre-trained model.

Test image

  1. Image of size 32x32 cropped from building image
  2. Image patch taken from building image used for training. This image is downsampled to 32x32.

I'm not sure how to solve, if its an overfitting problem. Please have a look at the attachments.

figure_trainingtrial1 figure_trainingtrial2 figure_building figure_patch building

sriprabhar avatar Oct 13 '18 08:10 sriprabhar

Hi, Also, for training, we have to create dataset in hdf5 format using matlab code. For creating h5 file using patches of single image, the file size is huge. (for 165 color patches, h5 file size is around 500 MB for 57x41 patches the file size is around 3GB) If I have a folder containing around 50 images of size 1080x1080 and if I run the matlab code for RGB color images, the system hangs. I'm not sure if I'm following the correct methods for dataset creation and training. Thanks for any kind of help/suggestions

sriprabhar avatar Oct 13 '18 10:10 sriprabhar

@sriprabhar Hi, I think the way how you generate the h5 is correct, while you will probably get to many small patches out of 1080x1080 because of the huge size. (3GB is not that big, TBH : ) )

A quick way to solve this is to change the stride when you run the matlab code for generation. The best way to solve this is to generate multiple h5 files, create a new generator with the folder contains h5 files as input, and fetch data from one h5 at a time.

twtygqyy avatar Oct 15 '18 17:10 twtygqyy

@sriprabhar also the result you plotted makes sense to me. Cause the image you tested might not be the exactly same one as you used in training. Grab one image from h5 which you used for training, and see if the result looks better.

twtygqyy avatar Oct 15 '18 17:10 twtygqyy

Thank you for your response, I will give the training patch and try. Also, one more doubt is that, the LapSRN works for Y component alone, we combined the bicubic interpolated cb and cr and merged with the Lap-srn super resolved y component. The results were good. If y component is sufficient for training and for PSNR measurement, then I would like to know why we have to train for RGB images (like in SRResNet)

sriprabhar avatar Oct 17 '18 09:10 sriprabhar

Hi @sriprabhar you can have a look at section 5.1 in this paper Fast and Accurate Image Super-Resolution Using A Combined Loss. They compared the difference between training with Y and RGB for SR.

twtygqyy avatar Oct 22 '18 16:10 twtygqyy