sd-webui-roop icon indicating copy to clipboard operation
sd-webui-roop copied to clipboard

Seems to be using CPU to do all the swapping

Open Ninkurk opened this issue 1 year ago • 2 comments

Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?

Ninkurk avatar Jun 23 '23 17:06 Ninkurk

I want the same thing does not use gpu some trick or command should be installed cuda?

gallojorge avatar Jun 23 '23 17:06 gallojorge

Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?

it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.

if you still want it, for Windows:

close console of stable diffusion

install these:

  1. Visual Studio Community 2022 with includes: a) Desktop development with C++; b) Python development (https://visualstudio.microsoft.com/downloads)
  2. install CUDA 11.8, install only CUDA (not installing) Driver, GFE, Physix (https://developer.nvidia.com/cuda-11-8-0-download-archive)
  3. download cuDNN v8.9.1 (May 5th, 2023), for CUDA 11.x, local installer for Windows (https://developer.nvidia.com/rdp/cudnn-archive)
  4. extract cuDNN v8.9.1 Zip to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 (bin, include, lib, license)

I'm not really sure 2-4 is necessary or not, but it's necessary for roop video version (standalone app).

on \stable-diffusion-webui\extensions\sd-webui-roop\requirements.txt edit: remove "onnxruntime==1.15.0" add "onnxruntime-gpu==1.15.0"

on \stable-diffusion-webui\extensions\sd-webui-roop\venv\Scripts\ , open console/terminal there type: ./activate pip uninstall onnxruntime pip install onnxruntime-gpu==1.15.0

close console

edit: change this also on \stable-diffusion-webui\extensions\sd-webui-roop\scripts\swapper.py providers = ["CUDAExecutionProvider"] save file, delete __pycache__ folder if it's exist.

run stable diffusion again.

you should see it now using CUDA on console while generating+swapping image.

thesomeotherguy avatar Jun 24 '23 19:06 thesomeotherguy

Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?

it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.

if you still want it, for Windows:

close console of stable diffusion

install these:

  1. Visual Studio Build Tools 2022 with includes: a) Desktop development with C++; b) Python development (https://aka.ms/vs/17/release/vs_BuildTools.exe)
  2. Visual Studio Community 2022 (https://visualstudio.microsoft.com/downloads)
  3. install CUDA 11.8, install only CUDA (not installing) Driver, GFE, Physix (https://developer.nvidia.com/cuda-11-8-0-download-archive)
  4. download cuDNN v8.9.1 (May 5th, 2023), for CUDA 11.x, local installer for Windows (https://developer.nvidia.com/rdp/cudnn-archive)
  5. extract cuDNN v8.9.1 Zip to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 (bin, include, lib, license)

I'm not really sure 3-5 is necessary or not, but it's necessary for roop video version (standalone app).

on \stable-diffusion-webui\extensions\sd-webui-roop\requirements.txt edit: remove "onnxruntime==1.15.0" add "onnxruntime-gpu==1.15.0"

on \stable-diffusion-webui\extensions\sd-webui-roop\venv\Scripts\ , open console/terminal there type: ./activate pip uninstall onnxruntime pip install onnxruntime-gpu==1.15.0

close console

run stable diffusion again.

you should see it now using CUDA on console while generating+swapping image.

I tried roop before the SD version and did most of what you mentioned above, but until now seems that it failed to use GPU and resort back to CPUproviders, that's why it generates more messages in the console and takes more time to swap face. Both roop and roop SD failed to use the GPU. FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs. In one of youtube vids someone did 8secs for one image generation+roop, but maybe he just did 1+7secs using 3090 and CPUproviders

kukalikuk avatar Jun 26 '23 02:06 kukalikuk

I tried roop before the SD version and did most of what you mentioned above, but until now seems that it failed to use GPU and resort back to CPUproviders, that's why it generates more messages in the console and takes more time to swap face. Both roop and roop SD failed to use the GPU. FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs. In one of youtube vids someone did 8secs for one image generation+roop, but maybe he just did 1+7secs using 3090 and CPUproviders

JUST 7 SECONDS???? I am running it on runpod with A5000 GPU and I need to wait around 4 minutes for 1 SWAP.

Ninkurk avatar Jun 26 '23 04:06 Ninkurk

I forgot something, change this also on \stable-diffusion-webui\extensions\sd-webui-roop\scripts\swapper.py

providers = ["CUDAExecutionProvider"]

it's located after from scripts.roop_logging import logger, and before @dataclass

save file, and delete __pycache__ folder if it's exist.

FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs.

that is exactly what I says.

it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.

somehow when on CUDA, it tooks much more time than CPU, not really worth it.

because somehow when on GPU/CUDA it gives more iteration on console that idk what was.

same for me, on CPU, it only gives additional 4-6 seconds, but GPU is much slower.

thesomeotherguy avatar Jun 26 '23 05:06 thesomeotherguy

7 secs with CPU, and should be faster with GPU. If it takes 4 mins then definitely something wrong with your installation. CMIIW, Someone in roop github could achieve 30 frame swaps/secs with 3090/4090. That's why I'm still looking for solution.

kukalikuk avatar Jun 27 '23 03:06 kukalikuk

@s0md3v why did you close this? Does using GPU or CUDAproviders should behave like we reported above?

kukalikuk avatar Jun 28 '23 03:06 kukalikuk

7 secs with CPU, and should be faster with GPU. If it takes 4 mins then definitely something wrong with your installation. CMIIW, Someone in roop github could achieve 30 frame swaps/secs with 3090/4090. That's why I'm still looking for solution.

The same issue occurs when I change the dependency library to onnxruntime-gpu 1.15.1 and modify "providers = ["CUDAExecutionProvider"]" in swapper.py. When I execute it, I encounter the following error message: onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126. Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. Although it produces output, the worst part is that it is even slower compared to the CPU version, taking approximately 7-8 seconds longer. My GPU is a 4080, and if I directly use Roop, the speed can reach 30-35 frames per second. Why is there such a significant difference?

zZxztxZz avatar Jul 06 '23 02:07 zZxztxZz