ELLA Training Details

First of all, thanks for the remarkable work! I've a few questions about the training details:

Batch size and steps of training SD1.5 on 512 res
Batch size and steps of training SDXL on 512 & 1024 res
Is 34M training data size necessary? Have you tried training on 0.1M, 1M, 10M oens?
How is the 100K 1024 res data collected?
What's the prompt used for data recaption?

Thanks

Apr 15 '24 11:04 chenbowen

I am training SDXL ELLA so I can answer a few.

Batch size and steps of training SDXL on 512 & 1024 res

As big as you can get ideally.

Is 34M training data size necessary? Have you tried training on 0.1M, 1M, 10M oens?

34 M is actually maybe too small. I have tried training on smaller amounts and if the concept doesn't exist in your training data the adapter may struggle with reproducing it, see: https://github.com/TencentQQGYLab/ELLA/issues/35

What's the prompt used for data recaption?

You can just use any major mllm like llava 1.6 and "Describe this image in detail".

May 07 '24 00:05 AmericanPresidentJimmyCarter

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

May 09 '24 01:05 DthdZK

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

looks like training scripts are here: https://github.com/DataCTE/ELLA_Training and https://github.com/TencentQQGYLab/ELLA/pull/27

May 24 '24 14:05 matbeedotcom

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

If it ever works well I will release the weights. I don't have anything like the results from the paper so far.

May 24 '24 21:05 AmericanPresidentJimmyCarter