stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Increased VRAM usage

Open 6andro opened this issue 1 year ago • 10 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

I just updated my webui and now high rex fix runs out of memory far quicker. I have a 3080 Ti with 12 GB and I used to be able to render at 640 x 940 and then upscale x2.6 to 1664x2496 which is 4'153'344 pixels (other resolutions equating at the same pixel count worked as well). Now I can't anymore. It seems that pytorch does not allocate enough vram. The lower the upscale ratio, the lower the allocation despite there being enough. Max factor now is x2

Tried to allocate 4.02 GiB (GPU 0; 12.00 GiB total capacity; 6.06 GiB already allocated; 2.27 GiB free; 7.45 GiB reserved in total by PyTorch)

Steps to reproduce the problem

generate image at 640 x 940. upscale at higher than x2 (high res fix)

What should have happened?

allocate max vram and render

Commit where the problem happens

20ae71f

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Windows

What device are you running WebUI on?

Nvidia GPUs (RTX 20 above)

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

Nothing other than default.

List of extensions

Config-Presets a1111-sd-webui-tagcomplete sd-dynamic-prompts sd-dynamic-thresholding sd-webui-additional-networks sd-webui-controlnet stable-diffusion-webui ultimate-upscale-for-automatic1111

Console logs

Relevant part:

Traceback (most recent call last):
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\txt2img.py", line 57, in txt2img
    processed = processing.process_images(p)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\processing.py", line 611, in process_images
    res = process_images_inner(p)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\processing.py", line 729, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\processing.py", line 1032, in sample
    samples = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(decoded_samples))
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 536, in forward
    h = self.mid.attn_1(h)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 620, in sdp_no_mem_attnblock_forward
    return sdp_attnblock_forward(self, x)
  File "E:\__Midjourney\StableDiffusion\automatic1111\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.02 GiB (GPU 0; 12.00 GiB total capacity; 6.06 GiB already allocated; 2.27 GiB free; 7.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

No response

6andro avatar May 27 '23 23:05 6andro

can confirm this. hires fix now is more sensitive than before, I have to dial it down the upscale factor in hires fix to overcome VRAM out of memory, which I don't have this problem on previous version.

version: [v1.3.0]  •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: 0.0.20

EDIT: it seems reinstalling a1111 from scratch to new folder fix it for me.

thesomeotherguy avatar May 28 '23 07:05 thesomeotherguy

I think it's not just hires fix, even normal txt2img generation requires more memories in the newest version. I am running stable diffusion on Kaggle, using a P100 GPU with 15.9GB GPU storage. Before the 1.3.0 update, I was able to do 832x960 images with batch size=3 in commit efac2cf. After the 1.3.0 update with commit 20ae71f, if I try to generate 832x960 with even batch size=1, it runs out of memory.

21x2-42 avatar May 28 '23 10:05 21x2-42

Can you try different Cross attention optimization options on the Optimizations settings page? If you had none specified before that's Doggettx now.

AUTOMATIC1111 avatar May 28 '23 10:05 AUTOMATIC1111

@AUTOMATIC1111 Just set Doggettx, this enables me higher resolutions again! Does this come with any downsides?

6andro avatar May 28 '23 10:05 6andro

@AUTOMATIC1111 I was too quick, at about 97% it ran out of memory which never happened before, usually it would run out when reaching the start of the hires fix pass (50%) and if it started it would complete. Now it started but failed at 97%. Tried to allocate 1014.00 MiB (GPU 0; 12.00 GiB total capacity; 10.35 GiB already allocated.

6andro avatar May 28 '23 11:05 6andro

Doggettx seems to be the only one that even starts, but it seems bugged, it is extremely slow and just randomly runs out of memory, this time at 70% which it took several minutes to get to. I have nothing else running that uses VRAM or is stealing GPU focus.

6andro avatar May 28 '23 11:05 6andro

I have nothing else running that uses VRAM or is stealing GPU focus.

Not even a browser? :) I've noticed shutting down Chrome helps with VRAM pressure, but I'm a tab hoarder... :)

akx avatar May 28 '23 11:05 akx

Can you try different Cross attention optimization options on the Optimizations settings page? If you had none specified before that's Doggettx now.

Can confirm. This did indeed fix it for me. I didnt use xformers before, so it defaulted to doggettx. I guess now its a must to select it in the new optimization options.

Zotikus1001 avatar May 28 '23 11:05 Zotikus1001

I have nothing else running that uses VRAM or is stealing GPU focus.

Not even a browser? :) I've noticed shutting down Chrome helps with VRAM pressure, but I'm a tab hoarder... :)

I do run it in chrome. But that usually is no issue. In the past week I've rendered every day even while playing games, including high res fix. And it never failed. The games sometimes had fps drops, and rendering took longer but it worked. I'm now trying it in Firefox, everything else closed.

6andro avatar May 28 '23 11:05 6andro

Can you try different Cross attention optimization options on the Optimizations settings page? If you had none specified before that's Doggettx now.

it seems reinstalling a1111 from scratch to new folder fix it for me, I am using --xformers now

using Doggettx gives me VRAM out of memory,

while using --xformers is now fine

idk what happened but reinstalling whole webui helps me.

edit: it's not working, still got VRAM oom. Go back to v1.2.1 with hash 89f9faa Hires fix works as intended.

thesomeotherguy avatar May 28 '23 15:05 thesomeotherguy

I have this same issue, though I use AMD GPU, so I get a slightly different error message

OutOfMemoryError: HIP out of memory. Tried to allocate 8.00 GiB (GPU 0; 11.98 GiB total capacity; 10.36 GiB already allocated; 1.54 GiB free; 10.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

I was able to consistently repro this, so I ran a git bisect and can verify that the root cause is at 2582a0f.

I can also verify that setting Doggettx as optimization fixes this problem, and generated image appears identical as before.

initialxy avatar May 28 '23 22:05 initialxy

I have the same issue on an rtx 4090 https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/10822

Switching to doggetix partially fixes the issue. I can now upscale again, but if I try to upscale by 2.5, my last attempt got to tile 42 / 42 then got stuck. The timer just kept counting up instead of down and nothing was happening. After several minutes, I got the cuda out of memory error again.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.19 GiB (GPU 0; 23.99 GiB total capacity; 15.07 GiB already allocated; 609.09 MiB free; 20.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I'm on a fresh install trying to fix this so all my settings are default aside from changing the optimizer to doggetix.

Switching my optimizer to --xformers almost worked.

Applying optimization: xformers... done.████████████████████████████████████           | 60/72 [03:47<02:17, 11.45s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 48/48 [00:08<00:00,  5.74it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [00:36<00:00,  1.51s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 72/72 [00:50<00:00,  1.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 48/48 [00:08<00:00,  5.47it/s]
Total prTile 1/4267%|████████████████████████████████████████████                      | 48/72 [00:08<00:03,  6.24it/s]
        Tile 2/42
        Tile 3/42
        Tile 4/42
        Tile 5/42
        Tile 6/42
        Tile 7/42
        Tile 8/42
        Tile 9/42
        Tile 10/42
        Tile 11/42
        Tile 12/42
        Tile 13/42
        Tile 14/42
        Tile 15/42
        Tile 16/42
        Tile 17/42
        Tile 18/42
        Tile 19/42
        Tile 20/42
        Tile 21/42
        Tile 22/42
        Tile 23/42
        Tile 24/42
        Tile 25/42
        Tile 26/42
        Tile 27/42
        Tile 28/42
        Tile 29/42
        Tile 30/42
        Tile 31/42
        Tile 32/42
        Tile 33/42
        Tile 34/42
        Tile 35/42
        Tile 36/42
        Tile 37/42
        Tile 38/42
        Tile 39/42
        Tile 40/42
        Tile 41/42
        Tile 42/42
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [01:23<00:00,  3.48s/it]
Error completing request███████████████████████████████████████████████████████████████| 72/72 [01:35<00:00,  3.49s/it]
Arguments: ('task(kf4kxi58t657xac)', '((best quality)), ((masterpiece)), (detailed) fighting female bandit, action scene, revealing bandit armor, small rusty dagger on hip, attacking, (fantasy illustration:1.3),  fierce eyes, tattered and torn leather armor, chainmail, rusted armor, angry, covered in dirt, (Luis Royo:1.2), (Yoshitaka Amano:1.1), mountain background, rock background,  cliffside background, copper eyes, glowing eyes, volumetric lighting, ambient occlusion, chromatic aberration, ray tracing, hasselblad, beautiful highly detailed, 4k, highres, intricate, (high-resolution:1.2)', 'nsfw, easynegative, (worst quality, low quality:1.4), (flat background:1.2), (plain background:1.2), text, bad anatomy, bad hands, error, extra digits, extra fingers, cropped, out of focus, letterboxed, big nipples, soft, plain background, extra ears, text, watermark, bad proportions, greyscale, monochrome, (lowres:1.1), multiple tails, blurry, bad-hands-5, (bad_prompt_version2:0.7), extra arms, extra legs, extra hands, (loli), skirt, shorts, simple background', [], 48, 16, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 960, 1248, True, 0.5, 2.5, 'R-ESRGAN 4x+ Anime6B', 24, 0, 0, 0, '', '', [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "D:\SDlocal\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\SDlocal\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\modules\txt2img.py", line 57, in txt2img
    processed = processing.process_images(p)
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 611, in process_images
    res = process_images_inner(p)
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 731, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 731, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 519, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\SDlocal\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\SDlocal\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 132, in forward
    h = nonlinearity(h)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2059, in silu
    return torch._C._nn.silu(input)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.14 GiB (GPU 0; 23.99 GiB total capacity; 16.49 GiB already allocated; 4.79 GiB free; 16.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

it failed after it hit 100% for some reason. Yeah, this sucks :(

Using --xformers and setting max-split size to 512 almost works too. It completes the image, gets to 100% then it still fails AFTER completion for some reason? The completed image is also not added to my output folder, so no idea if it just auto-deletes the completed image or whats happening.

Model loaded in 3.7s (load weights from disk: 0.3s, create model: 0.5s, apply weights to model: 0.8s, apply half(): 0.5s, load VAE: 0.2s, move model to device: 0.5s, load textual inversion embeddings: 0.8s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.34it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [01:24<00:00,  3.54s/it]
Error completing request███████████████████████████████████████████████████████████████| 44/44 [02:55<00:00,  3.54s/it]
Arguments: ('task(9751p6lne8p6i2t)', '((best quality)), ((masterpiece)), (detailed) fighting female bandit, action scene, revealing bandit armor, small rusty dagger on hip, attacking, (fantasy illustration:1.3),  fierce eyes, tattered and torn leather armor, chainmail, rusted armor, angry, covered in dirt, (Luis Royo:1.2), (Yoshitaka Amano:1.1), mountain background, rock background,  cliffside background, copper eyes, glowing eyes, volumetric lighting, ambient occlusion, chromatic aberration, ray tracing, hasselblad, beautiful highly detailed, 4k, highres, intricate, (high-resolution:1.2)', 'nsfw, easynegative, (worst quality, low quality:1.4), (flat background:1.2), (plain background:1.2), text, bad anatomy, bad hands, error, extra digits, extra fingers, cropped, out of focus, letterboxed, big nipples, soft, plain background, extra ears, text, watermark, bad proportions, greyscale, monochrome, (lowres:1.1), multiple tails, blurry, bad-hands-5, (bad_prompt_version2:0.7), extra arms, extra legs, extra hands, (loli), skirt, shorts, simple background', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 960, 1248, True, 0.5, 2.5, 'Latent', 24, 0, 0, 0, '', '', [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "D:\SDlocal\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\SDlocal\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\modules\txt2img.py", line 57, in txt2img
    processed = processing.process_images(p)
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 611, in process_images
    res = process_images_inner(p)
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 731, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 731, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\SDlocal\stable-diffusion-webui\modules\processing.py", line 519, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\SDlocal\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\SDlocal\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\SDlocal\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 132, in forward
    h = nonlinearity(h)
  File "D:\SDlocal\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2059, in silu
    return torch._C._nn.silu(input)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.14 GiB (GPU 0; 23.99 GiB total capacity; 16.49 GiB already allocated; 4.81 GiB free; 16.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Squeezitgirdle avatar May 29 '23 08:05 Squeezitgirdle

Because sdp uses more memory than xformers, if you want to upscale up to 4x - use xformers, with sdp you can upscale ~3.5x on the same videocard (in this case rtx 4090).

chrme avatar May 29 '23 13:05 chrme

can't upscale 4x with xformers. It gets up to 100% showing completed in CMD, but then fails due to cuda out of memory.

It does better than the others, but I think torch 2.0.1 is the problem as to why hires.fix is no longer working. This was also confirmed with one of the support guys on the Stable DIffusion discord.

Squeezitgirdle avatar May 30 '23 07:05 Squeezitgirdle

Same issue here after updating, & it was fixed by AUTOMATIC1111 suggestion above for switching to Doggettx in Cross attention optimization setting.

I have no xformers since it doesnt yet (or never will) work on 980ti 6gb cards. Many months ago I tried to compile xformers for it & it was actually slower than without it. 😂

macronomicus avatar May 30 '23 07:05 macronomicus

Because sdp uses more memory than xformers, if you want to upscale up to 4x - use xformers, with sdp you can upscale ~3.5x on the same videocard (in this case rtx 4090).

I'm not sure if you were responding to me, but removing sdp and just using xformers I am still unable to get above 2.5k images. It always works great but drastically slows and crashes when hires starts. Doggettx does not resolve, though I can generate bigger images (painfully slowly) with doggettx but not as big as I used to.

Additionally xformers is faster than doggettx and can generate slightly larger but also crashes with cuda out of memory error. Pytorch keeps taking up all of the memory.

I am highly suspicious the issue was caused by my pytorch updating to 2.0.1 since that was the only thing that changed since the last time it worked.

Squeezitgirdle avatar May 31 '23 06:05 Squeezitgirdle

The Doggettx optimization solved this for me.

6andro avatar May 31 '23 12:05 6andro

I am highly suspicious the issue was caused by my pytorch updating to 2.0.1 since that was the only thing that changed since the last time it worked.

I share your suspicion, and your observations.

I used to be able to scale things much larger, and I cannot reach really high resolutions like I used to, and that's with the Tiled Diffusion and Tiled VAE extensions.

AugmentedRealityCat avatar Jun 12 '23 06:06 AugmentedRealityCat

Same issue here after updating, & it was fixed by AUTOMATIC1111 suggestion above for switching to Doggettx in Cross attention optimization setting.

I have no xformers since it doesnt yet (or never will) work on 980ti 6gb cards. Many months ago I tried to compile xformers for it & it was actually slower than without it. 😂

Where do I check this setting?

PsychoGarlic avatar Jul 04 '23 18:07 PsychoGarlic

Same issue here after updating, & it was fixed by AUTOMATIC1111 suggestion above for switching to Doggettx in Cross attention optimization setting. I have no xformers since it doesnt yet (or never will) work on 980ti 6gb cards. Many months ago I tried to compile xformers for it & it was actually slower than without it. 😂

Where do I check this setting?

Its in the settings page, cant remember which tab, probably optimization tab, or you can just click display all button then search for Cross attention optimization

macronomicus avatar Jul 04 '23 21:07 macronomicus

Same issue here after updating, & it was fixed by AUTOMATIC1111 suggestion above for switching to Doggettx in Cross attention optimization setting. I have no xformers since it doesnt yet (or never will) work on 980ti 6gb cards. Many months ago I tried to compile xformers for it & it was actually slower than without it. 😂

Where do I check this setting?

Its in the settings page, cant remember which tab, probably optimization tab, or you can just click display all button then search for Cross attention optimization

I had to update A1111 to 1.30. I will try this setting now. Thank you!

PsychoGarlic avatar Jul 05 '23 15:07 PsychoGarlic