InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[bug]: OOM when upscaling with SD1.5 (SDXL works)

Open CubeTheThird opened this issue 1 year ago • 6 comments
trafficstars

Is there an existing issue for this problem?

  • [X] I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 6700 XT

GPU VRAM

12GB

Version number

5.0.0

Browser

Firefox 130.0.1

Python dependencies

{ "accelerate": "0.30.1", "compel": "2.0.2", "cuda": null, "diffusers": "0.27.2", "numpy": "1.26.4", "opencv": "4.9.0.80", "onnx": "1.15.0", "pillow": "10.4.0", "python": "3.10.13", "torch": "2.2.2+rocm5.7", "torchvision": "0.17.2+rocm5.7", "transformers": "4.41.1", "xformers": null }

What happened

When attempting to upscale an image using any SD1.5 models, the application crashes as it runs out of VRAM.

What you expected to happen

Expected the image to be upscaled.

How to reproduce the problem

Select an image, select an SD1.5 model, try upscaling.

Additional context

Testing image base resolution is 1800 x 1024 The upscaler is the standard RealESRGAN_x2plus, installed through the model manager. Upscaling output was marked as 2x (the lowest setting).

Using an SDXL model, the upscaling completes successfully (though takes a few minutes)

When observing system monitoring tools, there appear to be 2 main steps that primarily use the GPU. I assume these to be the generator model pass, and then the upscaler pass, though correct me if I'm wrong. With the SD1.5 attempts, the first step seems to finish (a relatively small amount of VRAM is allocated, processing occurs, then this is deallocated). The second step rapidly ramps up the VRAM usage, crashing almost immediately.

May possibly be related to https://github.com/invoke-ai/InvokeAI/issues/6301

Discord username

cubethethird

CubeTheThird avatar Sep 29 '24 02:09 CubeTheThird

I can concur that I am having the exact same issue (SD-1.5 upscaling fails for running out of VRAM, SDXL upscales fine) - same card, same OS platform. Was having the issue on 4.2.7 and upgraded to 5.0.0 but the issue still persists.

Best workaround for me right now is to generate the image using SD-1.5, and then use a similarish SDXL model to do the upscaling with.

russjr08 avatar Sep 29 '24 20:09 russjr08

Still occurring in Invoke 5.6.0

CubeTheThird avatar Jan 21 '25 22:01 CubeTheThird

Still occurring in Invoke 5.7.2

CubeTheThird avatar Mar 06 '25 18:03 CubeTheThird

Still occurring in Invoke 5.9.0

CubeTheThird avatar Mar 28 '25 12:03 CubeTheThird

@psychedelicious - I anticipate the graph is missing tiled decode

hipsterusername avatar Mar 28 '25 13:03 hipsterusername

The graphs are identical for the VAE decoding, and the same decode node & node settings are used for both. Tiling and tile size are the same.

Here is my VRAM usage for two upscaling runs:

Image

The overall memory trend is similar, but SDXL needs more VRAM overall. Makes sense - it is a bigger model.

However, SD1.5's decoding spike is greater in absolute value than SDXL's spike.

I don't think this is an issue w/ graphs or node settings. But I don't understand how the VAE works well enough to troubleshoot - only make observations.

psychedelicious avatar Mar 28 '25 22:03 psychedelicious

Still an issue in Invoke 5.10.1

CubeTheThird avatar Apr 22 '25 12:04 CubeTheThird

Still an issue in Invoke 5.11

CubeTheThird avatar May 13 '25 14:05 CubeTheThird

Still an issue in Invoke 5.12

CubeTheThird avatar May 25 '25 19:05 CubeTheThird

Still an issue in Invoke 6.0.0

CubeTheThird avatar Jul 09 '25 15:07 CubeTheThird

https://discord.com/channels/1020123559063990373/1149510134058471514/1393395097458184242

@hipsterusername and I tested changing the tiledMultidiffusion from 1024 to 512 and was able to upscale the generation.

My 30Gb AMD card could not handle tiling at 1024, so this is a huge consumer of vram.

message.txt

@hipsterusername said: Think we may just need to expose this as a frontend setting - it’s strange that it’s taking up so much vram but that may just be a consequence of being optimized cuda vs rocm

heathen711 avatar Jul 12 '25 02:07 heathen711

I can confirm in Invoke 6.2.0 that, while the default tile options do result in OOM, setting it down much lower (in the 500s) does allow for the upscale to work. I did however get a broken output with 512 tiles and 64 overlap, where one of the tiles came out black on the final output.

CubeTheThird avatar Jul 31 '25 20:07 CubeTheThird