PUT icon indicating copy to clipboard operation
PUT copied to clipboard

the qualitative comparison with PUTconv in figure 7

Open CyrilCsy opened this issue 2 years ago • 3 comments

I am very curious why there is such a big gap with the general CNN-based encoder for results. CNN should be able to learn to distinguish the masked region to a certain extent.

CyrilCsy avatar Sep 01 '22 15:09 CyrilCsy

Hi @CyrilCsy,

Thanks for your interests in our work. CNN-based encoder indeed can learn a good feature. However, the feature is not suitable for UQ-Transformer (it is good for reconstruction). The reason is that the masked regions (zero pixels) will have a negative impact on other unmasked regions. For PUT, the main artifacts is that a patch is easily to be predicted as black (zero pixels) if: 1) a partially masked patch contain some black pixels; 2) lots of black pixels in unmasked regions. The CNN-based convolution will transfer the black pixels from masked region to unmasked region, which will have a significant negative impact on the inpainted images.

By the way, I also have tried to fix this artifact. The next version of PUT is on the way.

liuqk3 avatar Sep 02 '22 02:09 liuqk3

Thanks for answering my confusion. I'm trying to train the model in Places, and I set batchsize=64 and keep epoch unchanged(100) when training pvqvae. But it shows that training takes more than 20 days, I wonder if this is necessary?If I just train it 10 epoch, will it make a big difference in the effect?

CyrilCsy avatar Sep 08 '22 14:09 CyrilCsy

Hi @CyrilCsy ,

According to my experience, P-VQVAE can achieve a promising reconstruction capability when the number of epochs are reduced. But you need to pay attention to some settings. For example, the number of iterations for warming up, the number of iterations when some losses are introduced (Discriminator, LPIPS, etc). You'd better set these number of iterations according to ratio of total number of iterations.

liuqk3 avatar Sep 10 '22 16:09 liuqk3