mage
mage copied to clipboard
question about Class-Conditional Image Generation
In appendix A.2 it is mentioned that the class label is concat as another input to the padded feature. I would like to ask how to encode from a text to a token, Is it the output of token_embed
? Because I noticed that the codebook size is 1024+1000+1, does this 1000 correspond to the class token in class-condition generation? If so, what is the value of the fake_cls_token?
It would be nice if there was code for the related Class-Conditional Image Generation, but I don't seem to see it.
We do not include a class-conditional image in this repo. The 1024+1000+1 token_emb is actually deprecated. When doing class conditional generation, we use another token_embedding which embeds the class label (0-999) and concatenate it with the input to the decoder (padded feature). We need to keep the fake_cls_token as part of the input to the encoder because the ViT encoder is fixed during class-conditional generation training.
tks.
We do not include a class-conditional image in this repo. The 1024+1000+1 token_emb is actually deprecated. When doing class conditional generation, we use another token_embedding which embeds the class label (0-999) and concatenate it with the input to the decoder (padded feature). We need to keep the fake_cls_token as part of the input to the encoder because the ViT encoder is fixed during class-conditional generation training.
will you release class-condition code?thanks.