CogView2 icon indicating copy to clipboard operation
CogView2 copied to clipboard

How many tokens should be trained when pretraining the text-to-image generation in the paper?

Open runzeer opened this issue 2 years ago • 4 comments

runzeer avatar Aug 03 '22 09:08 runzeer

Do you mean the max-sequence-length? we use 512 length (400 image tokens + up to 112 text tokens). I think the other hyperparams can be found in the paper, except for learning rate, which I need to check the codes after work, but I think any values around 1e-4 with warmup is okay.

Sleepychord avatar Aug 03 '22 17:08 Sleepychord

I mean the training hyperparameters such like training steps, the training tokens per batch. In the CogView paper, it gave us the relevant parameters. 1

runzeer avatar Aug 04 '22 00:08 runzeer

Hi, please refer to section 3.2 of CogView2 paper~

Sleepychord avatar Aug 04 '22 01:08 Sleepychord

Thanks a lot! Also, I had a question. when the model can generate the visible image, the loss could be what level. I mean could you give me some training logs so that I can do the experiments as a reference?

runzeer avatar Aug 05 '22 10:08 runzeer