taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

Hi, about your papaer, there are some questions

Open meiguoofa opened this issue 4 years ago • 4 comments
trafficstars

Hi, in your paper Fig4, you given a input image as condition and then the model generate diverse results, I'm curious about how can it generate diverse results since your all your model is fixed

meiguoofa avatar Jul 20 '21 07:07 meiguoofa

In your cond_transformer.py, the input of the forward have x and c, the c is condition image , What image is x ? Is x a ground truth?

meiguoofa avatar Jul 20 '21 08:07 meiguoofa

Since the discrete latent code can be obtained through the encoder, why do you need a transformer to predict the sequence?

meiguoofa avatar Jul 21 '21 02:07 meiguoofa

The decoder can directly decode its own quantized code, so why does this model need a transformer to predict the sequence?

meiguoofa avatar Jul 21 '21 02:07 meiguoofa

In second stage, all latent code in yout codebook is fixed,so how does the transformer use the characteristics of the transformer to achieve autoregressive prediction? If the sequence target predicted by the transformer is the quantized latent code sequence of the previous encoder, then what is the significance of this prediction??

meiguoofa avatar Jul 22 '21 13:07 meiguoofa