[Feat]: Add support of multi-GPU parallel training Lora?
Describe your use-case.
So far I haven't found a way to do this in parallel.
What would you like to see as a solution?
I don't know how to implement this feature.
Have you considered alternatives? List them here.
No response
+1
Duplicate of #69. Pull requests adding multi-gpu support are welcome however.
I might be interested in playing with multi-GPU training, but the costs are prohibitive currently. I'd estimate many hours of 2x A5000 rental initially, and later a limited number of hours of 4-8x A100+ for performance tests.
Feel free to delete this comment, if this is considered solicitation. [I'm not interested in multi-GPU dataset preparation through MGDS, only training]
I've looked into this a bit, and torch.distributed seems much more suitable to be integrated into OT codebase than the usual accelerate or torch DDP
https://github.com/Nerogar/OneTrainer/pull/816
A draft implementation is now available. Testers are welcome.