What is the image input for inference?

Open SenHe opened this issue 3 years ago • 1 comments

Thanks for this great work!

After going through the code, I got some questions.

In the first stage of training discrete VAE, we already trained a code book. Why we don't use it for second stage training but initialize a new code book for images.
During training, we use the original image as input. During inference, how to set the image input? Is it a random noise with size 3x256x256? How do we do the casual attention in transformer for inference?

Jun 29 '22 17:06 SenHe

After reading codes, I also don't know why there is another new code book. Do you have any idea now?

Sep 04 '22 15:09 kingnobro