DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

3090 -every 4th iteration is very slow

Open tamaragordon opened this issue 4 years ago • 8 comments

THIS IS NOT TECH SUPPORT FOR NEWBIE FAKERS POST ONLY ISSUES RELATED TO BUGS OR CODE

Expected behavior

a smooth training process

Actual behavior

****Every 4th iteration/epoch gets stuck for some time. i have narrowed it down to the fact that , the bigger ( resolution) the src (aligned) is . the slower the 4th iteration is .

there is a detailed thread on this issue : https://mrdeepfakes.com/forums/thread-every-4th-iteration-takes-a-long-time

also take a look at this video : https://streamable.com/nl07km this video is with the smallest model i could think of 1 batch and 64 resolution liae-ud**

if i lower the resolution of the aligned src , the training iterations go faster , but it will STILL take extra time on every 4th iteration. on a 320 resolution it takes upto 13-19 seconds .**

Steps to reproduce

**i tried to clean install windows , and follow all tips . updated cuda and cnn and drivers. tried on studio drivers and gameready ones.

tried with clean install windows and just drivers.

and yes my " hardware-accelerated GPU scheduling " is ON**

Other relevant information

very very interesting

check this video out: https://streamable.com/fhbix7

this is with when src aligned is WF 512 now every 16th iteration is slowing down

so the stuck/lag went from every 4th to every 16th because the src aligned was 2048 and now its 512.

but what does that have to do with anything, only the model perimeters should effect the iterations, and we extract bigger aligned srcs for future proofing ( somewhat)

more info : either its my cpu 6700k or maybe dfl/3090 needs more support . it runs better on 1024 images . gets stuck for a little while on the 16th iteration is 1024 images(src) aligned is used .

also , i tried running an old model used with a 980ti with my new 3090 , which had around 600k iterations, used 2048 src aligned and i expected 3 fast iterations and 4th one very slow but to my surprise the speed with the 3090 was very slow , around 2000 ms . so i think if high iterations are achieved , speed will slow down also .

, this is being trained on a nvme m2 so its not the hdd speed that causes this.

  • Operating system and version: Windows 10 20H2 build 19042.964

  • Python version: 3.5, 3.6.4, ... (if you are not using prebuilt windows binary)

i tried running it with the lastest build DeepFaceLab_NVIDIA_RTX3000_series_build_04_22_2021

tamaragordon avatar May 10 '21 23:05 tamaragordon

check this video out: https://streamable.com/fhbix7 - normal behavior, but your is diferent.

zabique avatar May 13 '21 14:05 zabique

Try with a 2048 src resolution And 320 model resolution

Also which cpu do you have ? I have a 6700k

tamaragordon avatar May 14 '21 15:05 tamaragordon

2048px aligned? u mad?

zabique avatar May 14 '21 15:05 zabique

No not really. Didnt have any trouble with the 980ti

with the rtx 3000 build and a 3090 the training process seem to get stuck on every 4th iteration.

Src aligned size has got nothing to do with model resolution and i am sure you are aware if this.

But it seems right now , the bigger the src aligned size , the more stuckier it will work.

Its running ok with 1024 src aligned.

but 1024 gives little “future proofing “since the actual picture (f/hf/head) resolution would be much smaller

so in short: bigger aligned = ultra slow

model can be 64x64 and still get stuck because aligned is big.

Please check out the first video i attached.

tamaragordon avatar May 14 '21 20:05 tamaragordon

Try using a 2048 src aligned and report back. I am very interested in your results. Thank you.

tamaragordon avatar May 14 '21 20:05 tamaragordon

I'm using an RTX3090 on an older build (Dec 22 ish? Because that was when gan still worked okay) and just recently started running an SRC that's 1536 px on a model that's 416 px. I just noticed that it's also slowing down every 4th iteration (tho there are other iterations that are slightly slower too). Also running on a nvme (Samsung 970 pro).

Didn't notice the problem on SRC 1024 px

ff7rule avatar May 17 '21 09:05 ff7rule

Issue solved / already answered (or it seems like user error), please close it.

joolstorrentecalo avatar Jun 08 '23 22:06 joolstorrentecalo

It's 2023, I still cannot find a solution for my rtx 3090, it jumps every 4 or every 8 it's much slower in the overall run than my old 2070. Ryzen 5 3600, 32GB Corsair 3200, Samsung evo SSD nvme. Has anyone found a solution?

SrakPtak11 avatar Aug 31 '23 09:08 SrakPtak11