sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Training Ultra Slow with more than 1 GPU - very likely affecting all users with more than 1 GPU

Open SteVoit opened this issue 10 months ago • 1 comments

Hi there, i am running on 4x RTX4090 and as soon as i use more than 1 GPU the training gery super slow with the newer scripts starting from 22.x.x

i think that there is a general problem in kohya with multi gpu i tested 3 version: 24.x.x. 22.x.x. and 21.x.x. same machine same cuda Version (11.8) same GPU (536.23) driver same gpus (4x4090) I have also tested different driver versions (520 as suggested by cuda 11.8, 535 and 550 in ubuntu) on windoes i have tested with different versions also (including more recent with disabled Fallback to System RAM) kohya 21 takes 1:50, kohya 22 and 24 take ~28:00 its like 15 times slower

i tested under ubuntu22.04, ubuntu20.04 and windows 10. no matter what i do i can not get the speeds back to the speeds of 21 i tested also with cuda12 and cuda11 - same issue anyone got any ideas?

i even tested on 2 systems, one with 4 gpus and one with 2 gus, one intel one amd tested gloo and nccl so im quite sure that everybody will run into this issue if they use more than 1 gpu

i have back ported current version 24. to the same requirements than 21 (torch 2.0.1, and so on) where i get the good speed but no luck

what is also confusing me a lot is that caching talents takes about 10x longer on the 22 and 24 kohya

kohya_21 kohya_21_1 kohya_22 kohya_22_1 kohya_24 kohya_24_1

SteVoit avatar Apr 21 '24 19:04 SteVoit

Did resolve this issue ?

Thehunk1206 avatar May 27 '24 00:05 Thehunk1206