generative_inpainting Questions about results from with my own dataset

Hi, Jiahui! After an one-week training on a GTX 1080TI, I found some interesting results from my own dataset. There are 2 kinds of images in my dataset. One is the images with clearly texture like this: 7147 The inpainting results of this kinds of images are semantic plausible: 7182_ip demo16 demo17 7147_ip Also, there are some images like this that contains more information and structure: However, the result of this image from my pre-trained model is quite blurry and bad: 2_ip Here are my hypothesis:

The second kind of images is minority in my training set. Should I have to increase the ratio to 1:1? The total number of images in my dataset is 8000.
According to #21 and #53, I should do fine-tuning for my model. Could you please give my suggestions of changes of the hyper-parameters? Here are the screenshots from Tensorboard:

Jul 05 '18 08:07 xhh232018

Hi, first thanks for your interest in our work and sharing some of your results. Here are some answers that may help:

The balance of data is important. So it may help if you can increase samples of the second case. You can either collect more examples or do data augmentations like random flipping/rotation/adjusting colors.
To fine-tune a pre-trained model, you do not have to change the hyper-parameters.
More data samples will help since in your case you only have 8k images. Usually I work on at least 30k images (up to 10 millions of images).

Jul 05 '18 21:07 JiahuiYu

OK, I will try to create more training samples. Also, I mean that the pre-trained model is what I trained based my own dataset and your default hyper-parameters setting not the one you provided. Should I change the hyper-parameters if I want to refine it?

Jul 06 '18 03:07 xhh232018

You don't need to change hyper-parameters in my understanding, unless you find some failure cases like ones addressed in issue #53 and #21.

Jul 06 '18 05:07 JiahuiYu

OK. Thanks for your help. I'll try to train a new model based on more training samples ASAP and I will give you my latest results after several days.

Jul 07 '18 03:07 xhh232018

@JiahuiYu ，Sorry to bother you again. I want to re-implement the deepfill v2 based on your deepfill v1 since deepfill v1 cannot handle irregular masked images. Thus, I want to confirm that the gated convolution layers are only used in the coarse network. Should I have to replace the vanilla convolution layers with gated convolution layers in the refinement network? Thanks for your help.

Jul 12 '18 08:07 xhh232018

Gated convolution are used in both networks. I think it is important to use gated convolution in refinement network as well.

Jul 12 '18 18:07 JiahuiYu

@JiahuiYu Thanks for your help. I see you mentioned it in your paper. Sorry to bother you again, I still need to confirm some changes in Deepfill V2 implementation: 1.You said the input is only the masked image and the encoder-decoder structure is the same as Deepfill V1. This is my understanding of the Gating convolution Layer: 1420426555

Is it right? Therefore, I do not need to concatenate the ones and the mask for the input like this:

My implementation is based on Deepfill V1 and you said all vanilla convolution layers are changed to gated convolution layers. How about the last layer since its activation function is none?

Should I keep it or change it to gated convolution?

3.In your paper, you said the contextual attention layer is the same as V1. Therefore, should the input for contextual attention layer include the binary mask? (In my opinion, I will put the mask into that layer)

4.The gan loss of deepfill V1 is based on neural gym and you use this setting https://github.com/pfnet-research/sngan_projection/blob/master/updater.py to calculate the gan loss. Can I define this kind of loss in neuralgym? Thanks for your help and looking forward to your reply.

Jul 14 '18 11:07 xhh232018

@xhh232018 Hi first thanks for your interest and I saw you already carefully read the paper and code. I appreciate. For your questions:

The fig3 in paper shows that masks are also concatenated as input. Also I concatenated the ones. The reason is addressed in issue #40. Sorry I forget to mention concatenating ones in the paper.
Keep the last convolution as it is instead of gated convolution (both the coarse network and refinement network).
The implementation of contextual attention layer need binary mask to indicate which pixels are missing and need to be reconstructed. So your understanding is correct.
Actually I already released the implementation of sn-gan loss in dev branch in neuralgym. :)

Jul 14 '18 19:07 JiahuiYu

generative_inpainting generative_inpainting copied to clipboard

Questions about results from with my own dataset

generative_inpainting
generative_inpainting copied to clipboard