sd-webui-roop
sd-webui-roop copied to clipboard
Seems to be using CPU to do all the swapping
Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?
I want the same thing does not use gpu some trick or command should be installed cuda?
Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?
it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.
if you still want it, for Windows:
close console of stable diffusion
install these:
- Visual Studio Community 2022 with includes: a) Desktop development with C++; b) Python development (https://visualstudio.microsoft.com/downloads)
- install CUDA 11.8, install only CUDA (not installing) Driver, GFE, Physix (https://developer.nvidia.com/cuda-11-8-0-download-archive)
- download cuDNN v8.9.1 (May 5th, 2023), for CUDA 11.x, local installer for Windows (https://developer.nvidia.com/rdp/cudnn-archive)
- extract cuDNN v8.9.1 Zip to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 (bin, include, lib, license)
I'm not really sure 2-4 is necessary or not, but it's necessary for roop video version (standalone app).
on \stable-diffusion-webui\extensions\sd-webui-roop\requirements.txt edit: remove "onnxruntime==1.15.0" add "onnxruntime-gpu==1.15.0"
on \stable-diffusion-webui\extensions\sd-webui-roop\venv\Scripts\ , open console/terminal there type: ./activate pip uninstall onnxruntime pip install onnxruntime-gpu==1.15.0
close console
edit: change this also on \stable-diffusion-webui\extensions\sd-webui-roop\scripts\swapper.py
providers = ["CUDAExecutionProvider"]
save file, delete __pycache__
folder if it's exist.
run stable diffusion again.
you should see it now using CUDA on console while generating+swapping image.
Seems to be using CPU to do all the swapping, do you know how I can switch it to GPU?
it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.
if you still want it, for Windows:
close console of stable diffusion
install these:
- Visual Studio Build Tools 2022 with includes: a) Desktop development with C++; b) Python development (https://aka.ms/vs/17/release/vs_BuildTools.exe)
- Visual Studio Community 2022 (https://visualstudio.microsoft.com/downloads)
- install CUDA 11.8, install only CUDA (not installing) Driver, GFE, Physix (https://developer.nvidia.com/cuda-11-8-0-download-archive)
- download cuDNN v8.9.1 (May 5th, 2023), for CUDA 11.x, local installer for Windows (https://developer.nvidia.com/rdp/cudnn-archive)
- extract cuDNN v8.9.1 Zip to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 (bin, include, lib, license)
I'm not really sure 3-5 is necessary or not, but it's necessary for roop video version (standalone app).
on \stable-diffusion-webui\extensions\sd-webui-roop\requirements.txt edit: remove "onnxruntime==1.15.0" add "onnxruntime-gpu==1.15.0"
on \stable-diffusion-webui\extensions\sd-webui-roop\venv\Scripts\ , open console/terminal there type: ./activate pip uninstall onnxruntime pip install onnxruntime-gpu==1.15.0
close console
run stable diffusion again.
you should see it now using CUDA on console while generating+swapping image.
I tried roop before the SD version and did most of what you mentioned above, but until now seems that it failed to use GPU and resort back to CPUproviders, that's why it generates more messages in the console and takes more time to swap face. Both roop and roop SD failed to use the GPU. FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs. In one of youtube vids someone did 8secs for one image generation+roop, but maybe he just did 1+7secs using 3090 and CPUproviders
I tried roop before the SD version and did most of what you mentioned above, but until now seems that it failed to use GPU and resort back to CPUproviders, that's why it generates more messages in the console and takes more time to swap face. Both roop and roop SD failed to use the GPU. FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs. In one of youtube vids someone did 8secs for one image generation+roop, but maybe he just did 1+7secs using 3090 and CPUproviders
JUST 7 SECONDS???? I am running it on runpod with A5000 GPU and I need to wait around 4 minutes for 1 SWAP.
I forgot something, change this also on \stable-diffusion-webui\extensions\sd-webui-roop\scripts\swapper.py
providers = ["CUDAExecutionProvider"]
it's located after from scripts.roop_logging import logger
, and before @dataclass
save file, and delete __pycache__
folder if it's exist.
FYI, in my 3060 it takes +7secs to swap face with CPU (13secs total for one image generation+roop), but if I set CUDA providers and it will give long message in the console before resort back to CPU and takes +20secs.
that is exactly what I says.
it's not worth it (for now), I tried, it works, but somehow too many things happened in console and it tooks more time than CPU, especialy if you run only 1 image on txt2img not batches on img2img.
somehow when on CUDA, it tooks much more time than CPU, not really worth it.
because somehow when on GPU/CUDA it gives more iteration on console that idk what was.
same for me, on CPU, it only gives additional 4-6 seconds, but GPU is much slower.
7 secs with CPU, and should be faster with GPU. If it takes 4 mins then definitely something wrong with your installation. CMIIW, Someone in roop github could achieve 30 frame swaps/secs with 3090/4090. That's why I'm still looking for solution.
@s0md3v why did you close this? Does using GPU or CUDAproviders should behave like we reported above?
7 secs with CPU, and should be faster with GPU. If it takes 4 mins then definitely something wrong with your installation. CMIIW, Someone in roop github could achieve 30 frame swaps/secs with 3090/4090. That's why I'm still looking for solution.
The same issue occurs when I change the dependency library to onnxruntime-gpu 1.15.1 and modify "providers = ["CUDAExecutionProvider"]" in swapper.py. When I execute it, I encounter the following error message: onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126. Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Although it produces output, the worst part is that it is even slower compared to the CPU version, taking approximately 7-8 seconds longer. My GPU is a 4080, and if I directly use Roop, the speed can reach 30-35 frames per second. Why is there such a significant difference?