taming-transformers
taming-transformers copied to clipboard
Hi, about your papaer, there are some questions
Hi, in your paper Fig4, you given a input image as condition and then the model generate diverse results, I'm curious about how can it generate diverse results since your all your model is fixed
In your cond_transformer.py, the input of the forward have x and c, the c is condition image , What image is x ? Is x a ground truth?
Since the discrete latent code can be obtained through the encoder, why do you need a transformer to predict the sequence?
The decoder can directly decode its own quantized code, so why does this model need a transformer to predict the sequence?
In second stage, all latent code in yout codebook is fixed,so how does the transformer use the characteristics of the transformer to achieve autoregressive prediction? If the sequence target predicted by the transformer is the quantized latent code sequence of the previous encoder, then what is the significance of this prediction??