use multigpu(8*A800 80G) train flux_train.py OOM problem
when i use flux_train.py to train my flux-full-finetune model, i use --optimizer_type adamw8bit & --batch_size 1, this situation always meets OOM.but also, single gpu trainning can use --optimizer_type adamw8bit & --batch_size 8, and the gpu-using size almost 79g. How can i fix the multi-gpu OOM problem? thanks for your replay~
Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing
Some features for VRAM optimization do not work very well with multi-GPU setups, such as --fused_backward_pass. However, 80GB of VRAM should at least work with a batch size of 1. Have you tried --gradient_checkpointing
yes,i had used --gradient_checkpointing. Also,I tried different configurations to get it to work but all failed.
Please provide full config
This is lack of discussion and experiment (e.g. finetune SDXL with 1x 3090 and 4x 3090 is in different league)
I have recently explored most options provided from both accelerate and this repo
And made my own attempt in PR #1950
A100 doesn't support FP8 otherwise it can be saved for even more VRAM.
You can further crank the deepspeed option by setting ZeRO Stage-3 with FullyShardedDataParallel (FSDP).
Good luck
If you need more features, make the flux_train.py from flux_train_netowrk.py by looking my PR (not explored how many codes to be modified yet). However it is useful for large dataset (> 1M) only