PUT
PUT copied to clipboard
the qualitative comparison with PUTconv in figure 7
I am very curious why there is such a big gap with the general CNN-based encoder for results. CNN should be able to learn to distinguish the masked region to a certain extent.
Hi @CyrilCsy,
Thanks for your interests in our work. CNN-based encoder indeed can learn a good feature. However, the feature is not suitable for UQ-Transformer (it is good for reconstruction). The reason is that the masked regions (zero pixels) will have a negative impact on other unmasked regions. For PUT, the main artifacts is that a patch is easily to be predicted as black (zero pixels) if: 1) a partially masked patch contain some black pixels; 2) lots of black pixels in unmasked regions. The CNN-based convolution will transfer the black pixels from masked region to unmasked region, which will have a significant negative impact on the inpainted images.
By the way, I also have tried to fix this artifact. The next version of PUT is on the way.
Thanks for answering my confusion. I'm trying to train the model in Places, and I set batchsize=64 and keep epoch unchanged(100) when training pvqvae. But it shows that training takes more than 20 days, I wonder if this is necessary?If I just train it 10 epoch, will it make a big difference in the effect?
Hi @CyrilCsy ,
According to my experience, P-VQVAE can achieve a promising reconstruction capability when the number of epochs are reduced. But you need to pay attention to some settings. For example, the number of iterations for warming up, the number of iterations when some losses are introduced (Discriminator, LPIPS, etc). You'd better set these number of iterations according to ratio of total number of iterations.