sd-scripts Running Flux Lora training on 2 GPUs

Running Flux Lora training on 2 GPUs

Open innokean opened this issue 5 months ago • 11 comments

First of all, many thanks for doing this! This is the only repo I'm aware of which allows doing Flux Lora training on a 16GB GPU. I appreciate this is new and the lack of information is unavoidable. To set the context, here's what I do: follow the guide on SD3 branch for Flux dev training for 12 GB GPU https://github.com/kohya-ss/sd-scripts/tree/sd3 I was initially baffled about the dataset creation but thankfully came across this https://huggingface.co/kohya-ss/misc-models and just adopted it. I have to use the options for 12 GB GPU because I get OOM on GPU otherwise. Anyhow, the training is working (but I'm yet to test the results). It's rather painfully slow which is expected I guess. So I'm looking for a way to speed it up. I tried accelerate on 2x GPUs but got an error: [rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") [rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'train_blocks' which I think comes from me having to use --split_mode I wonder if spreading the model over 2 GPUs is possible like diffusers do with device_map. This obviously also comes with its own inefficiency since the other GPU is idle while one is working but at least it avoids copying data back and forth between the main memory and GPU. Incidentally, I noticed that doing torch copy back to CPU is about twice as slow as copying to GPU.

Sep 01 '24 15:09 innokean

sd-scripts sd-scripts copied to clipboard

Running Flux Lora training on 2 GPUs

sd-scripts
sd-scripts copied to clipboard