Leo Jiang comments

Results 23 comments of


                                            Leo Jiang

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

Can you try regarding #9829 ? I have saved memory by implementing this :)

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

@riflemanl I used bf16 with the deepspeed and accelerate, it should work. Another reason is the FLUX is 12B model, it costs lots of memory

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

24GB is not enought, what's your hardware setting? Try to reduce the batch size and resolution

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

Batch size 1 with resolution 256 will cost around 40GB on my 8 GPU. I think you should try off loads to cpu

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

Try with deep speed, but I don't think one gpu is enough

With examples/dreambooth /README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB

It also depends on the size of the dataset

Dreambooth Flux training does not save a model for around 10-15 minutes

Can you try regarding #9829 ? I have saved memory by implementing this :)

Dreambooth Flux training does not save a model for around 10-15 minutes

@kopyl Before the modification, it just stuck with almost the same VRAM when training with one GPU. After that, it can save in a fast speed. But I didn't check.

Dreambooth Flux training does not save a model for around 10-15 minutes

@kopyl I think this issue is related to accelerator, my algo saves memory in the training process. But your issue is in the last saving step. Can you change `accelerator.is_main_process`...

Dreambooth Flux training does not save a model for around 10-15 minutes

@kopyl In the saving section I mean. add "if accelerator.is_main_process:" add `or accelerator.distributed_type == DistributedType.DEEPSPEED`, and don't forget to import DistributedType from accelerate