stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Vulkan Memory Allocation Failure on Large Buffer Request (ErrorOutOfDeviceMemory)

Open zhouraym opened this issue 5 months ago • 5 comments

I'm encountering a Vulkan memory allocation error when attempting to run sd-v1-4.ckpt with -W 1024 -H 1024. Below is the error log:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:192  - loading model from 'D:\sd-master-1896b28-bin-win-vulkan-x64\models\sd-v1-4.ckpt'
[INFO ] model.cpp:1001 - load D:\sd-master-1896b28-bin-win-vulkan-x64\models\sd-v1-4.ckpt using checkpoint format
ZIP 0, name = archive/data.pkl, dir = archive/
[INFO ] stable-diffusion.cpp:243  - Version: SD 1.x
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q4_0
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q4_0
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q4_0
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             q4_0
  |==================================================| 1131/1131 - 0.00it/s
[INFO ] stable-diffusion.cpp:558  - total params memory size = 1562.22MB (VRAM 1562.22MB, RAM 0.00MB): clip 191.00MB(VRAM), unet 1276.75MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:562  - loading model from 'D:\sd-master-1896b28-bin-win-vulkan-x64\models\sd-v1-4.ckpt' completed, taking 40.76s
[INFO ] stable-diffusion.cpp:604  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:2008 - TXT2IMG
[INFO ] stable-diffusion.cpp:738  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1562 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1696 - get_learned_condition completed, taking 497 ms
[INFO ] stable-diffusion.cpp:1719 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1768 - generating image: 1/1 - seed 42
ggml_vulkan: Device memory allocation of size 8767072832 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 8767072832
[ERROR] ggml_extend.hpp:1138 - unet: failed to allocate the compute buffer`

zhouraym avatar Jul 21 '25 09:07 zhouraym

My vulkaninfo details are as follows:

maxMemoryAllocationCount = 4,746,648
maxMemoryAllocationSize = 0xffff0000

Is it possible to avoid allocating such a large amount of GPU memory at once?

zhouraym avatar Jul 21 '25 09:07 zhouraym

That's a hard limit of the Vulkan backend, I tried to find a way to work around it, but I'm not sure it's even possible. That being said, I don't think there's any sd1.x model that works well at such high resolution anyways.

stduhpf avatar Jul 21 '25 10:07 stduhpf

There is a new llama.cpp PR that seems to address this: ggml-org/llama.cpp#15815

That being said, I don't think there's any sd1.x model that works well at such high resolution anyways.

Indeed, most already struggle around 768x768. Could be nice for low-strength img2img, though.

wbruna avatar Sep 05 '25 13:09 wbruna

That being said, I don't think there's any sd1.x model that works well at such high resolution anyways.

Indeed, most already struggle around 768x768. Could be nice for low-strength img2img, though.

I actually torture sd1 with 768x1024, and throw out ~half.

Green-Sky avatar Sep 05 '25 13:09 Green-Sky

@zhou7510 , could you try again with master-309-35843c7 ? It includes many new options for reducing VRAM usage (--vae-conv-direct, --diffusion-conv-direct, --offload-to-cpu). I was able to generate even larger images with less than 4G VRAM.

wbruna avatar Sep 27 '25 19:09 wbruna