CogView2
CogView2 copied to clipboard
How many tokens should be trained when pretraining the text-to-image generation in the paper?
Do you mean the max-sequence-length? we use 512 length (400 image tokens + up to 112 text tokens). I think the other hyperparams can be found in the paper, except for learning rate, which I need to check the codes after work, but I think any values around 1e-4 with warmup is okay.
I mean the training hyperparameters such like training steps, the training tokens per batch. In the CogView paper, it gave us the relevant parameters.
Hi, please refer to section 3.2 of CogView2 paper~
Thanks a lot! Also, I had a question. when the model can generate the visible image, the loss could be what level. I mean could you give me some training logs so that I can do the experiments as a reference?