ELLA icon indicating copy to clipboard operation
ELLA copied to clipboard

Training Details

Open chenbowen opened this issue 1 year ago • 4 comments

First of all, thanks for the remarkable work! I've a few questions about the training details:

  • Batch size and steps of training SD1.5 on 512 res
  • Batch size and steps of training SDXL on 512 & 1024 res
  • Is 34M training data size necessary? Have you tried training on 0.1M, 1M, 10M oens?
  • How is the 100K 1024 res data collected?
  • What's the prompt used for data recaption?

Thanks

chenbowen avatar Apr 15 '24 11:04 chenbowen

I am training SDXL ELLA so I can answer a few.

Batch size and steps of training SDXL on 512 & 1024 res

As big as you can get ideally.

Is 34M training data size necessary? Have you tried training on 0.1M, 1M, 10M oens?

34 M is actually maybe too small. I have tried training on smaller amounts and if the concept doesn't exist in your training data the adapter may struggle with reproducing it, see: https://github.com/TencentQQGYLab/ELLA/issues/35

What's the prompt used for data recaption?

You can just use any major mllm like llava 1.6 and "Describe this image in detail".

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

DthdZK avatar May 09 '24 01:05 DthdZK

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

looks like training scripts are here: https://github.com/DataCTE/ELLA_Training and https://github.com/TencentQQGYLab/ELLA/pull/27

matbeedotcom avatar May 24 '24 14:05 matbeedotcom

@AmericanPresidentJimmyCarter Hello, I would appreciate it if you could share your training script and code,thanks a lot!

If it ever works well I will release the weights. I don't have anything like the results from the paper so far.