3090 -every 4th iteration is very slow
THIS IS NOT TECH SUPPORT FOR NEWBIE FAKERS POST ONLY ISSUES RELATED TO BUGS OR CODE
Expected behavior
a smooth training process
Actual behavior
****Every 4th iteration/epoch gets stuck for some time. i have narrowed it down to the fact that , the bigger ( resolution) the src (aligned) is . the slower the 4th iteration is .
there is a detailed thread on this issue : https://mrdeepfakes.com/forums/thread-every-4th-iteration-takes-a-long-time
also take a look at this video : https://streamable.com/nl07km this video is with the smallest model i could think of 1 batch and 64 resolution liae-ud**
if i lower the resolution of the aligned src , the training iterations go faster , but it will STILL take extra time on every 4th iteration. on a 320 resolution it takes upto 13-19 seconds .**
Steps to reproduce
**i tried to clean install windows , and follow all tips . updated cuda and cnn and drivers. tried on studio drivers and gameready ones.
tried with clean install windows and just drivers.
and yes my " hardware-accelerated GPU scheduling " is ON**
Other relevant information
very very interesting
check this video out: https://streamable.com/fhbix7
this is with when src aligned is WF 512 now every 16th iteration is slowing down
so the stuck/lag went from every 4th to every 16th because the src aligned was 2048 and now its 512.
but what does that have to do with anything, only the model perimeters should effect the iterations, and we extract bigger aligned srcs for future proofing ( somewhat)
more info : either its my cpu 6700k or maybe dfl/3090 needs more support . it runs better on 1024 images . gets stuck for a little while on the 16th iteration is 1024 images(src) aligned is used .
also , i tried running an old model used with a 980ti with my new 3090 , which had around 600k iterations, used 2048 src aligned and i expected 3 fast iterations and 4th one very slow but to my surprise the speed with the 3090 was very slow , around 2000 ms . so i think if high iterations are achieved , speed will slow down also .
, this is being trained on a nvme m2 so its not the hdd speed that causes this.
-
Operating system and version: Windows 10 20H2 build 19042.964
-
Python version: 3.5, 3.6.4, ... (if you are not using prebuilt windows binary)
i tried running it with the lastest build DeepFaceLab_NVIDIA_RTX3000_series_build_04_22_2021
check this video out: https://streamable.com/fhbix7 - normal behavior, but your is diferent.
Try with a 2048 src resolution And 320 model resolution
Also which cpu do you have ? I have a 6700k
2048px aligned? u mad?
No not really. Didnt have any trouble with the 980ti
with the rtx 3000 build and a 3090 the training process seem to get stuck on every 4th iteration.
Src aligned size has got nothing to do with model resolution and i am sure you are aware if this.
But it seems right now , the bigger the src aligned size , the more stuckier it will work.
Its running ok with 1024 src aligned.
but 1024 gives little “future proofing “since the actual picture (f/hf/head) resolution would be much smaller
so in short: bigger aligned = ultra slow
model can be 64x64 and still get stuck because aligned is big.
Please check out the first video i attached.
Try using a 2048 src aligned and report back. I am very interested in your results. Thank you.
I'm using an RTX3090 on an older build (Dec 22 ish? Because that was when gan still worked okay) and just recently started running an SRC that's 1536 px on a model that's 416 px. I just noticed that it's also slowing down every 4th iteration (tho there are other iterations that are slightly slower too). Also running on a nvme (Samsung 970 pro).
Didn't notice the problem on SRC 1024 px
Issue solved / already answered (or it seems like user error), please close it.
It's 2023, I still cannot find a solution for my rtx 3090, it jumps every 4 or every 8 it's much slower in the overall run than my old 2070. Ryzen 5 3600, 32GB Corsair 3200, Samsung evo SSD nvme. Has anyone found a solution?