Giyeong Oh
Giyeong Oh
or give argument `--save_state` at the begining of training from scratch, and use this argument for resuming `--resume="STATE_IN_OUTPUT_DIR"` These option utilize saving and loading optimizer state and weight.
> The error was not caused by DDP multi-gpu training, but by DeepSpeed... The original DDP multi-gpu training was fine with Flux Lora training, but as long as you installed...
> > Updated the sd3 branch. Multi-GPU training should now work. Please report again if the issue remains. > > I have four A100-40G,Is it feasible to train flux model...
> > > > Updated the sd3 branch. Multi-GPU training should now work. Please report again if the issue remains. > > > > > > > > > I...
> @BootsofLagrangian Hi there, hope that i can reach out to you. I also get this dtype error when training flux lora with deepspeed multigpu. Do you have any updates...
> > [@terrificdm](https://github.com/terrificdm) With the RTX3090(24GB), image resolution=1024 condition, the multi-gpu Flux finetuning is OOM, have you ever occur the same problem? Thanks a lot!My config script is : ...
It supports on sd-3 branch. If you have trouble logs, please attach them.
> > Unfortunately multi GPU training of FLUX has not been tested yet. `--split_mode` doesn't seem to work with multi GPU training. > > The current single-card training is indeed...
> Has anyone managed to get it working properly? I have 4 GPUs, but it still only runs on the first GPU.  Is trainer in caching process?
> > > Has anyone managed to get it working properly? I have 4 GPUs, but it still only runs on the first GPU.  > > > > >...