stable-diffusion-webui-amdgpu icon indicating copy to clipboard operation
stable-diffusion-webui-amdgpu copied to clipboard

[Bug]: Pc is crashing when creating image

Open Sunneyred101 opened this issue 2 years ago • 23 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

I followed this guide to download and get SD running : https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

I can open and use the webui, but when i attempt to create an image once its at 20-30% my pc immediately turns off. Happens almost every time. when it doesnt turn off I get this error "RuntimeError: Could not allocate tensor with 4915840 bytes. There is not enough GPU video memory available!"

PC specs: 16gb ram rx 5700 xt ryzen 7 1800x 750W gold psu

Steps to reproduce the problem

  1. Install using https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
  2. run webui-user.bat
  3. enter prompt and run
  4. within seconds computer crashes

What should have happened?

Image should have generated normally

Commit where the problem happens

3e8555242

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

set COMMANDLINE_ARGS=--medvram --precision full --no-half --opt-sub-quad-attention --opt-split-attention-v1 --disable-nan-check

List of extensions

No

Console logs

venv "E:\stable-diffusion-webui-directml-master\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: <none>
Installing requirements for Web UI
Launching Web UI with arguments: --medvram --precision full --no-half --opt-sub-quad-attention --opt-split-attention-v1 --disable-nan-check
Interrogations are fallen back to cpu. This doesn't affect on image generation. But if you want to use interrogate (CLIP or DeepBooru), check out this issue: https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/10
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
Loading weights [fe4efff1e1] from E:\stable-diffusion-webui-directml-master\models\Stable-diffusion\model.ckpt
Creating model from config: E:\stable-diffusion-webui-directml-master\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying sub-quadratic cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 6.3s (load weights from disk: 4.1s, create model: 0.9s, apply weights to model: 1.2s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Additional information

No response

Sunneyred101 avatar Mar 01 '23 19:03 Sunneyred101

hmm, do you have updated AMD drivers for your machine? did you try with different sets of arguments? It does run in float16 for me, so I do not need --precision full --no-half

ClashSAN avatar Mar 02 '23 18:03 ClashSAN

I have seen similar auto turn-off on my PC, but only for 'larger' resolutions. If I run 512x512, everything is fine. If I run 512x768, sometimes my PC would turn off by itself just before the image is generated (near 100%), as if someone pulled the plug. But sometimes it doesn't turn off and I could the generated image. The randomness is really annoying.

Specs for my PC: 16GB RAM RX 6650XT Ryzen5 3500X 500W PSU

ARGS: --precision full --no-half --opt-sub-quad-attention --disable-nan-check

I thought the culprit was my 500W PSU, which might be fake and was not really 500W. But now I get confused, seeing that you have a 750W PSU and still get the auto turn-off. Will buy a new PSU later and double check.

zhangjun-qcqi avatar Mar 03 '23 03:03 zhangjun-qcqi

I had this problem and the wattage of my PSU seemed all right. Eventually I've replaced that and the problem gone away. I guess it was just a bit too weary for such workloads.

Miraihi avatar Mar 03 '23 15:03 Miraihi

I have considered open this issue myself after researching but someone beats me to it so I just include more information here. In issue #20 at least me and TheAkashicTraveller have this problem. I feel like this problem happen exclusively with RX5700XT users but zhangjun-qcqi have a different model describe the problem happen in a similar way.

Other than windows reporting error as ID 6008, the GPU driver tells event id 411 kernel pnp. Which in fact did not help me pinpoint the problem.

My suspect is the driver problem but I've tried 22.112 and 20.11.2 the problem still persist in the same way so I doubt any driver inbetween could fix this.

Inspired by the above comment I figured I'd share my PC specs as well 16GB RAM running @ 3200Mhz dual channel RX5700XT Ryzen9 5900x with bequiet pure rock2 850W PSU Gold +1 sata ssd +1fan (I open the case cover when I do heavy work, like this)

ARGS: --medvram--precision full --no-half --opt-sub-quad-attention --opt-split-attention-v1 --disable-nan-check

I'm really tempted to try the @Miraihi suggestion of upgrading the PSU but this could take some time(and money).

gogogame avatar Mar 03 '23 16:03 gogogame

So I put --lowvram and its been working, super slowly, but at least its working. I doubt its the psu i think its our card because from what I've seen people who have the 5700xt have been having this issue around the board. We should be able to run at --medvram since this card has 8gb but I'm just not sure why it isn't working. @gogogame

Sunneyred101 avatar Mar 04 '23 06:03 Sunneyred101

On my 6900xt after following the installation guide above, i cant seem to finish generating a 512x512 image.

even with the parameter -opt-split-attention-v1

I get to the final step then i receive the following error:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 4096, 1, 512) (torch.float16) key : shape=(1, 4096, 1, 512) (torch.float16) value : shape=(1, 4096, 1, 512) (torch.float16) attn_bias : <class 'NoneType'> p : 0.0 cutlassF is not supported because: device=privateuseone (supported: {'cuda'}) flshattF is not supported because: device=privateuseone (supported: {'cuda'}) max(query.shape[-1] != value.shape[-1]) > 128 tritonflashattF is not supported because: device=privateuseone (supported: {'cuda'}) max(query.shape[-1] != value.shape[-1]) > 128 triton is not available smallkF is not supported because: device=privateuseone (supported: {'cuda', 'cpu'}) dtype=torch.float16 (supported: {torch.float32}) max(query.shape[-1] != value.shape[-1]) > 32 unsupported embed per head: 512

the folder outputs does not even get created. it just fails there.

The image generation process is working tho, as I can see the previews of the steps running in the interface.

GitGudGandalf avatar Mar 04 '23 15:03 GitGudGandalf

I ran Windows 11 in Bootcamp on my Mac Pro (2019) with 16-core Xeon + W5700X, with or without those args will hard reboot once it tried to generate image. The PSU is 1400W and I don't think it's PSU related.

x43x61x69 avatar Mar 05 '23 00:03 x43x61x69

I have seen similar auto turn-off on my PC, but only for 'larger' resolutions. If I run 512x512, everything is fine. If I run 512x768, sometimes my PC would turn off by itself just before the image is generated (near 100%), as if someone pulled the plug. But sometimes it doesn't turn off and I could the generated image. The randomness is really annoying.

Specs for my PC: 16GB RAM RX 6650XT Ryzen5 3500X 500W PSU

ARGS: --precision full --no-half --opt-sub-quad-attention --disable-nan-check

I thought the culprit was my 500W PSU, which might be fake and was not really 500W. But now I get confused, seeing that you have a 750W PSU and still get the auto turn-off. Will buy a new PSU later and double check.

I just buy a new 600W gold PSU and my issue disappears. Of course it still complains not enough video memory sometimes, but I never see auto turn-off any more.

zhangjun-qcqi avatar Mar 07 '23 02:03 zhangjun-qcqi

I have the same problem when generating many AI images

mengdalida avatar Mar 10 '23 14:03 mengdalida

Same problem with one single image, Rx 580 here

DerRehberg avatar Mar 14 '23 15:03 DerRehberg

I get the same problem. I generate 1st normally. Got shut down at 2nd generation. If I run once time after starting web-ui, then restart it. It generates normally. The VRAM is ~52xx-53xx My PC: Ryzen 5 3600 + Vega 64 + 32Gb RAM COMMANDLINE_ARGS=--lowvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --opt-split-attention-v1 --opt-split-attention-invokeai --autolaunch

lovecode1001 avatar Mar 18 '23 03:03 lovecode1001

Same issue here.

FSP HEXA+ 500W PSU RX 580 8gb vram ryzen 2200G 16gb ram

This is actually not the first use case which causes the shutdown issue, i've also had this after launching war thunder or splitgate on any setting, or having too many particles in an unity game. I think it might be that my PSU doesn't really like the rx 580 doing certain things.

L3-NR avatar Mar 20 '23 16:03 L3-NR

I think i might have found a solution.

I've since investigated further, and it seems the issue is indeed with power, so if you are having this problem only here, and do not wish to buy a new PSU, i recommend: going into AMD Software-->Performance-->Tuning-->Custom, Turn on Power Tuning, and drag the slider to the left a bit.

I started with -30% power, but i think it's a bit 'underkill' for my setup, so i might give it more power, but currently it's running very stable, never even once crashing the PC and shutting down. This means generations are a bit slower, but i think it's a worthy tradeoff if it means that the image is actually generated, and the computer doesn't lose power suddenly, which is always risky for a multitude of reasons both to your software and hardware.

If you are not experiencing the same kind of problem with other applications, for example while gaming, you can just switch back to 'Default' in the AMD performance tuning settings when not generating images.

L3-NR avatar Mar 20 '23 19:03 L3-NR

My PC gives me a Blackscreen, so it can't be the psu probably, i still hear sounds

DerRehberg avatar Mar 21 '23 14:03 DerRehberg

I been getting the same on my RX 580. It will black screen the monitor and go to 100% fans but things will seem to be running (except the generation fails). It usually happens when I'm running a big image or a larger batch size. I can reset and it will come back up but with no graphics. Then I can safely power off and it will come back with the generic graphics driver with the GPU marked as disabled in Device Manager. then I just have to renable and reboot again to get it back up.

But I'm also undervolting, so it maybe something to do with that. I can game for hours with no issues, but something about stable diffusion or this directml version make it angry. I also ran a GPU ram test with OCCT at 95% utilization (max you can set) for an hour, just to make sure it wasn't a GPU VRAM failure.

TeutonJon78 avatar Mar 21 '23 17:03 TeutonJon78

It's not only directml. I had the same issue with the older Onyx-based Stable Diffusion and upscaling software (Topaz Video AI). Yes, the sympthoms were the same - black screen, the PC is seemingly still running. I also tried undervolting to little result. I was also reluctant about replacing the PSU, but eventually done it and it fixed the problem. Looks like the older models cannot handle the loads of such scale.

Miraihi avatar Mar 21 '23 17:03 Miraihi

WTF I don't wanna replace my PSU. I'll just use Shark, it doesn't crash

DerRehberg avatar Mar 24 '23 17:03 DerRehberg

I think i might have found a solution.

I've since investigated further, and it seems the issue is indeed with power, so if you are having this problem only here, and do not wish to buy a new PSU, i recommend: going into AMD Software-->Performance-->Tuning-->Custom, Turn on Power Tuning, and drag the slider to the left a bit.

I started with -30% power, but i think it's a bit 'underkill' for my setup, so i might give it more power, but currently it's running very stable, never even once crashing the PC and shutting down. This means generations are a bit slower, but i think it's a worthy tradeoff if it means that the image is actually generated, and the computer doesn't lose power suddenly, which is always risky for a multitude of reasons both to your software and hardware.

If you are not experiencing the same kind of problem with other applications, for example while gaming, you can just switch back to 'Default' in the AMD performance tuning settings when not generating images.

I try with -15%, then it works normally

lovecode1001 avatar Mar 25 '23 03:03 lovecode1001

I ran Windows 11 in Bootcamp on my Mac Pro (2019) with 16-core Xeon + W5700X, with or without those args will hard reboot once it tried to generate image. The PSU is 1400W and I don't think it's PSU related.

Did you ever solve this?

simplesisu avatar May 14 '23 16:05 simplesisu

Got similar issue here. The directml version works fine (but slow) to me. However, the Rocm version doesn't. Crashes when generate larger images. (larger than 768x768, but even in 768x768 it sometimes still crashes) System: Linux Mint (basically Ubuntu) GPU: Radeon VII (16GB Vram) PSU: 500W So at first I thought it would be a PSU issue, but in Windows when running the directml version, even at constant 250W max power my gpu still works fine. While in linux, even if I set the max power limit to 70W, the computer still crashes after a while (albeit longer than leave the power setting at default) --low-vram or --med-vram does seem to help a little, but not much. Still experiencing reboot. When I check the system log after reboot, I see lots of messages saying "amdgpu 0000:12:00.0: amdgpu: Failed to send message 0x28, response 0xffffffff" 2023-06-11 15-12-55屏幕截图

fgdfgfthgr-fox avatar Jun 11 '23 03:06 fgdfgfthgr-fox

I think i might have found a solution.

I've since investigated further, and it seems the issue is indeed with power, so if you are having this problem only here, and do not wish to buy a new PSU, i recommend: going into AMD Software-->Performance-->Tuning-->Custom, Turn on Power Tuning, and drag the slider to the left a bit.

I started with -30% power, but i think it's a bit 'underkill' for my setup, so i might give it more power, but currently it's running very stable, never even once crashing the PC and shutting down. This means generations are a bit slower, but i think it's a worthy tradeoff if it means that the image is actually generated, and the computer doesn't lose power suddenly, which is always risky for a multitude of reasons both to your software and hardware.

If you are not experiencing the same kind of problem with other applications, for example while gaming, you can just switch back to 'Default' in the AMD performance tuning settings when not generating images.

Thanks for the hint and I am at last getting my RX580 working with a clocked-down speed of 820 MHz in order to keep the GPU below 80 deg C so that it won't get black out by overheat.

In my case I am sure it is the heat and not the power because I do not have to limit the power and of course it only drawing some <60W when running at 820MHz.

For the time being I guess I have to live with this manual setting...

toshioson avatar Sep 09 '23 12:09 toshioson

I have the same issue with the black screen after a couple of generated images, as was described above. My spec: GPU: RX 580, 8Gb PSU: 630 Watt, BeQuiet. CPU: Intel Core i7-6700K RAM: 32 Gb

I'll try the recommendations from above (I'll experiment):

  • reduce 'Max Frequency' of GPU by -15%;
  • reduce 'Power Limit' by -15%;

VirtualMaestro avatar Sep 26 '23 07:09 VirtualMaestro

After extensive testing because I saw the same behavior under Arch Linux running Kernel 6.17.7-zen (AMD R7 7800 X3D, 32GB DDR5, AMD RX 6950 XT) it seems like I managed to fix the crashes by reducing power limit and removing the undervolt on my GPU using LACT. I am running a fixed GPU Core and Memory Clock. GPU: 2300MHz, VRAM 2200MHz Power Limit set to 272W and GPU voltage Offset 0

Hope it helps somebody.

xknex avatar Nov 10 '25 10:11 xknex