stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Possible Nvidia driver issues
Discussed in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11062
Originally posted by w-e-w June 7, 2023 some users have reported some issues related to the latest Nvidia drivers nVidia drivers change in memory management vladmandic#1285 https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11050#issuecomment-1578731478 if you have been experiencing generation slowdowns or getting stuck, consider downgrading to driver version 531 or below NVIDIA Driver Downloads
Funny, I am randomly getting the issues where an output is stuck at 50% for an hour, and I am on 531.41 for an NVIDIIA 3060 12GB model
Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.
I can confirm this bug. I was getting results (as expected) before I installed the latest Titan RTX drivers. I will try installing a previous build.
Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.
Yeah, that's exactly how it is for me. When I tried inpainting, the first gen runs through just fine, but any subsequent ones have massive hang-ups, necessitating a restart of the commandline window and rerunning webui-user.bat.
I wasn't sure if there was a problem with the drivers so I reinstalled WebUI, but the problem didn't go away. To think, everything generates fine like before, but once the High Res Fix starts and finishes, it looks like a minute pause. Edit: confirming. Downgraded to 531.68. Now everything as it was
If you are stuck with a newer Nvidia driver version, downgrading to Torch 1.13.1 seems to work too.
- Add the following to
webui-user.bat
:set TORCH_COMMAND=pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
- Remove
<webui-root>/venv
directory - (Re)start WebUI
I am having the opposite issue where on the newer drivers my first image generation is slow because of some clogged memory on my GPU which frees itself as soon as it gets to the second one.
Downgrading Torch didn't seem to help at all. Downgrading from 536.23 to 531.79 fixes the problem instantly.
Anyone, is this problem still relevant?
I haven't tried with the latest drivers, so I don't know if this issue is still ongoing.
Extremely slow for me. Downgraded the pytorch, and had a whole lotta of new problems. What usually took 4h is taking 10+
Please tell me there is a fix in the pipeline?
For pro graphics (at least for my A4000) 531 is not going to help with eliminating the issue. Need to downgrade to at least 529 to get rid of the shared memory usage. And both 529 / 531 / 535 / 536 in production brunch are working way worse, than 531 at new feature (uses shared VRAM, but way smaller footprint for some reason)
Can confirm this is still an issue, I have a RTX 3080 TI and downgrading to 531.68 solved it for me.
I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.
It got a tiny bit better here. torch 1.13.1+cu117. 531.79. Cuda compilation tools release 12.0, V12.0.76
Still having issues with the duration of the generations. Usually, 200 frames took 4h, and now it is taking 10 (720x1280, 30 steps, 2~3 controlNets). Don't know how to fix it properly. Every other fix I did, severely damaged the quality of the images. I now know that I was using the 1.2.1 version of the webUI and the torch was not 2.0. Every other setting I do not remember. Now I have everything written somewhere hahahah
Em dom., 9 de jul. de 2023 às 11:10, Detrian @.***> escreveu:
I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.
— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1627668465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED22T44Y6R7JGQJVB5LDXPJ7RZANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>
536.67 fixed this? or not?
I did not try it. A lot of wasted time already hahjaja
Em qua., 19 de jul. de 2023 às 00:12, dajusha @.***> escreveu:
536.67 fixed this? or not?
— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1641104904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED2ZHD4ROQFAXGYNLTKLXQ4J7HANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>
536.67 fixed it for me.
536.67 also worked for me somewhat, meaning it still seems to drop to shared memory but not as aggressively (latest versions seem to start using shared memory at 10GB rather than fully maxing out all available 12GB which matters.
The 536.67 driver release notes still references shared memory, and I recently started getting the "hanging at 50% bug" again today after updating some plugins which prompted me to dig a bit deeper for some solutions.
I often use 2 or 3 ControlNet 1.1 models + Hi-res Fix upscaling on a 12GB card which is what triggers it if I watch my Performance tab and see the GPU begin to use shared CPU memory.
The ideal fix would be finding some way to create a --never-use-migraine-inducing-shared-memory
flag, but I assume this would rely on some driver or operating system API to become available after some light research as there doesn't seem to be a way to "block" a specific process from using shared memory.
However, for the good news - I was able to massively reduce this >12GB memory usage without resorting to --medvram
with the following steps:
Initial environment baseline
- Check your CLI to make sure you don't have any "using old xformers" WARN message (not sure if this is actually related but it was part of the process, so makes sense to include it)
- Add
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
towebui-user.bat
- I assume here, 12GB users are already running the flags
--xformers
and--opt-split-attention
.
Biggest improvement
Assuming your environment already looks similar to the above, by far the biggest VRAM drop I found was switching from the 1.4GB unpruned .pth
ControlNet 1.1 models to these 750MB pruned .safetensors
versions https://civitai.com/models/38784
Hope this helps anyone in a similar frustrating position 😁
From my understanding ComfyUI might've done something with CUDA's malloc to fix this. https://github.com/comfyanonymous/ComfyUI/commit/1679abd86d944521cad8a94a09d30fd5e238ae22
Looks like a lot of cards also don't support this though: https://github.com/search?q=repo%3Acomfyanonymous%2FComfyUI+malloc&type=commits&s=author-date&o=desc
536.67 also did not fix this, according to the release notes.
https://us.download.nvidia.com/Windows/536.67/536.67-win11-win10-release-notes.pdf
3.2 Open Issues in Version 536.67 WHQL
This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running Stable Diffusion and DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]
I updated the drivers without thinking this might happen and now I can't go back. I have tried removing the drivers with "Display Driver Uninstaller" and then installing v531.68 and v528.49 , but it still doesn't go as fast as before. RTX 4080 (Laptop) 12GB. I seem to be missing something.
Edit: finally my problem seems to be with the laptop itself. Yesterday I was testing 536.67 and 536.99 on my desktop using RTX 3080 with no problems.
After downgrading to 531.79 i noticed that it was using very slightly less ram, but was slower. So i downgraded to 531.18 but cant see any difference to 536.67 other then aforementioned less ram usage.
win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024
my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090
my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes
going to uninstall 536.57 and install 531.79
536.99 just released, with the open issue mentioned prior still there, but the mention of Stable Diffusion seemingly vanished. (It was given a reference number of 4172676 as mentioned here)
https://us.download.nvidia.com/Windows/536.99/536.99-win11-win10-release-notes.pdf
3.2 Open Issues in Version 536.99 WHQL
[DaVinci Resolve] This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]
Has anyone tried 536.99?
I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic. 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.
Has anyone tried 536.99?
I just installed 536.99 using RTX 3080 and so far it's working fine.
I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic, 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.
you mention 'rolling back to 531'. nvidia advises against using roll back. they say to uninstall the updated driver, and then download and install the old driver.
win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024
my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090
my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes
going to uninstall 536.57 and install 531.79
after uninstalling 536 and installing 531 I am back to the speeds I had before