Leo Jiang
Leo Jiang
Can you try regarding #9829 ? I have saved memory by implementing this :)
@riflemanl I used bf16 with the deepspeed and accelerate, it should work. Another reason is the FLUX is 12B model, it costs lots of memory
24GB is not enought, what's your hardware setting? Try to reduce the batch size and resolution
Batch size 1 with resolution 256 will cost around 40GB on my 8 GPU. I think you should try off loads to cpu
Try with deep speed, but I don't think one gpu is enough
It also depends on the size of the dataset
Can you try regarding #9829 ? I have saved memory by implementing this :)
@kopyl Before the modification, it just stuck with almost the same VRAM when training with one GPU. After that, it can save in a fast speed. But I didn't check.
@kopyl I think this issue is related to accelerator, my algo saves memory in the training process. But your issue is in the last saving step. Can you change `accelerator.is_main_process`...
@kopyl In the saving section I mean. add "if accelerator.is_main_process:" add `or accelerator.distributed_type == DistributedType.DEEPSPEED`, and don't forget to import DistributedType from accelerate