sd-scripts
sd-scripts copied to clipboard
Training Ultra Slow with more than 1 GPU - very likely affecting all users with more than 1 GPU
Hi there, i am running on 4x RTX4090 and as soon as i use more than 1 GPU the training gery super slow with the newer scripts starting from 22.x.x
i think that there is a general problem in kohya with multi gpu i tested 3 version: 24.x.x. 22.x.x. and 21.x.x. same machine same cuda Version (11.8) same GPU (536.23) driver same gpus (4x4090) I have also tested different driver versions (520 as suggested by cuda 11.8, 535 and 550 in ubuntu) on windoes i have tested with different versions also (including more recent with disabled Fallback to System RAM) kohya 21 takes 1:50, kohya 22 and 24 take ~28:00 its like 15 times slower
i tested under ubuntu22.04, ubuntu20.04 and windows 10. no matter what i do i can not get the speeds back to the speeds of 21 i tested also with cuda12 and cuda11 - same issue anyone got any ideas?
i even tested on 2 systems, one with 4 gpus and one with 2 gus, one intel one amd tested gloo and nccl so im quite sure that everybody will run into this issue if they use more than 1 gpu
i have back ported current version 24. to the same requirements than 21 (torch 2.0.1, and so on) where i get the good speed but no luck
what is also confusing me a lot is that caching talents takes about 10x longer on the 22 and 24 kohya
Did resolve this issue ?