kohya_ss icon indicating copy to clipboard operation
kohya_ss copied to clipboard

How to train LORA with multiple GPUs

Open martindellavecchia opened this issue 8 months ago • 6 comments

After resolving the no avx support of my GPU here: https://github.com/bmaltais/kohya_ss/issues/2582, thanks @b-fission I went ahead and kicked off my lora training, it started training using just one of my two available GPUs.

I've tried running the seup.bat and configured the accelerator specificing to use all the available GPUs, but it doesn't fix it.

Then I went to the web interface, I marked "multi GPU", i selected to run two processes but nothing. I got

11:10:46-280029 INFO Command executed. [2024-06-10 11:10:52,058] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-413GD2B]:29500 (system error: 10049 - La direcci¾n solicitada no es vßlida en este contexto.).

I read over the internet that I need to update my gui.bat with "set CUDA_VISIBLE_DEVICES=1", which I did but it kept training my lora using just 1 gpu

Any guide on how to properly tell kohya to use my 2080ti and the 3060 connected?. Kohya sees them present in the system.

Thanks so much !

martindellavecchia avatar Jun 10 '24 13:06 martindellavecchia