mage icon indicating copy to clipboard operation
mage copied to clipboard

question about Class-Conditional Image Generation

Open LinB203 opened this issue 1 year ago • 3 comments

In appendix A.2 it is mentioned that the class label is concat as another input to the padded feature. I would like to ask how to encode from a text to a token, Is it the output of token_embed? Because I noticed that the codebook size is 1024+1000+1, does this 1000 correspond to the class token in class-condition generation? If so, what is the value of the fake_cls_token? It would be nice if there was code for the related Class-Conditional Image Generation, but I don't seem to see it.

LinB203 avatar Mar 22 '23 12:03 LinB203

We do not include a class-conditional image in this repo. The 1024+1000+1 token_emb is actually deprecated. When doing class conditional generation, we use another token_embedding which embeds the class label (0-999) and concatenate it with the input to the decoder (padded feature). We need to keep the fake_cls_token as part of the input to the encoder because the ViT encoder is fixed during class-conditional generation training.

LTH14 avatar Mar 22 '23 13:03 LTH14

tks.

LinB203 avatar Mar 23 '23 06:03 LinB203

We do not include a class-conditional image in this repo. The 1024+1000+1 token_emb is actually deprecated. When doing class conditional generation, we use another token_embedding which embeds the class label (0-999) and concatenate it with the input to the decoder (padded feature). We need to keep the fake_cls_token as part of the input to the encoder because the ViT encoder is fixed during class-conditional generation training.

will you release class-condition code?thanks.

LinB203 avatar Jul 10 '23 14:07 LinB203