OFA Effect of VQGAN code randomness

Effect of VQGAN code randomness

Open varadgunjal opened this issue 2 years ago • 2 comments

I understand from #258 that there is randomness in the generated VQGAN code sequences because of Gumbel Softmax, but the different sequences nevertheless reconstruct to similar looking images. However, since the training is done by predicting the sequence tokens and not by comparing the reconstructed images themselves, I am wondering if and how having different token sequences will affect the pretraining and downstream performance? Was this something that had been investigated to check for consistency in performance across different variations of the generated code sequences?

Nov 21 '22 20:11 varadgunjal

A good question. In our preliminary experiments, we found that using different sequences can slightly improve the model performance, it seems that the randomness in the vqgan encoding process becomes some data augments or label smoothing. But we didn't conduct a more in-depth quantitative study.

Nov 23 '22 06:11 jxst539246

I see. So what you're saying is that there is some value in using multiple (slightly different) sequences representing the same image and this could be interpreted as data augmentation on the sequences used for the Image Infilling task. Interesting take. I would like to try and explore this further.

Nov 23 '22 07:11 varadgunjal

OFA OFA copied to clipboard

Effect of VQGAN code randomness

OFA
OFA copied to clipboard