About the choice of optimizer
My current work requires me to generate images of dolls as an alternative to photography. The doll itself is nude, and the pictures generated need to reflect body details, so the pictures I provided for training would be more complicated. I found that adamw8bit could not better train FLUX's lora model (it could not be fitted with low learning rate, and the pictures with high learning rate were directly blurred). The prodigy optimizer used in training SDXL before can better get the desired results, can you add prodigy to the trainer?
Prodigy is already present
Prodigy is already present
Any hints on how to use it? Just replacing adamw8bit for Prodigy in the optimizer? No extra special params needed?
Thank you!
`lr: 1
noise_offset: 0.1
lr_scheduler: "cosine"
optimizer: "Prodigy"
optimizer_params:
decouple: true
use_bias_correction: False
betas: [0.9, 0.99]
weight_decay: 0.05`
This is what Im using, around 2000 steps for 10 images
`lr: 1 noise_offset: 0.1 lr_scheduler: "cosine" optimizer: "Prodigy" optimizer_params: decouple: true use_bias_correction: False betas: [0.9, 0.99] weight_decay: 0.05` This is what Im using, around 2000 steps for 10 images
Awesome, thanks a lot!
This OOM for me on 4090. Rank/Alpha are you using with Prodigy?
Coming over from SDXL I had read some time ago that Prodigy and Adafactor should have an alpha of 1 because of the adaptive learning rate. Network dim I've always run Prodigy on 16-32. Flux is pretty heavy on the VRAM though and so is Prodigy, and there seems to be some differences in VRAM management from one training program to another. Over on SDXL with my 4090, I have been using Easy Lora Training Scripts with its kohya backend with Prodigy on batches of 12 but using straight kohya I get OOM's above a batch size of 4 and I have yet to be able to train a full checkpoint. I have a feeling a good portion of Flux is going to remain out of reach of us mere 4090 peasants.
Okay, and how do I see Prodigy on the UI interface, in the optimizer list?