kohya_ss
kohya_ss copied to clipboard
How to train LORA with multiple GPUs
After resolving the no avx support of my GPU here: https://github.com/bmaltais/kohya_ss/issues/2582, thanks @b-fission I went ahead and kicked off my lora training, it started training using just one of my two available GPUs.
I've tried running the seup.bat and configured the accelerator specificing to use all the available GPUs, but it doesn't fix it.
Then I went to the web interface, I marked "multi GPU", i selected to run two processes but nothing. I got
11:10:46-280029 INFO Command executed. [2024-06-10 11:10:52,058] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-413GD2B]:29500 (system error: 10049 - La direcci¾n solicitada no es vßlida en este contexto.).
I read over the internet that I need to update my gui.bat with "set CUDA_VISIBLE_DEVICES=1", which I did but it kept training my lora using just 1 gpu
Any guide on how to properly tell kohya to use my 2080ti and the 3060 connected?. Kohya sees them present in the system.
Thanks so much !