mindall-e icon indicating copy to clipboard operation
mindall-e copied to clipboard

What training setup did you use?

Open rom1504 opened this issue 4 years ago • 1 comments

This looks great!

Could you share some information on what setup you used for the training of the transformer model?

  • how many gpu / for how long
  • how many steps
  • what batch size

It would be helpful to have these information to better understand the cost of training dalle models.

rom1504 avatar Dec 17 '21 11:12 rom1504

Here is mild commentary on this in https://github.com/kakaobrain/minDALL-E/issues/6

Hello @SeungyounShin, thanks for your test on the zero-shot image-to-image translation.

As you mention, an autoregressive text-to-image generation model can conduct unseen tasks in the zero-shot manner, even though the training dataset does not include the exactly same types of text-image pairs ! However, the capability on the zero-shot learning increases when the model size & dataset set increase together. Please note that the released minDALL-E is yet smaller scale model (1.3B params, 14M text-image pairs) than the original implementation of OpenAI (12B params, 250M text-image pairs).

The problem will be solved when a larger scale of model is trained on a larger number of training samples, and we will also release large scale of models.

afiaka87 avatar Dec 17 '21 18:12 afiaka87