stable-diffusion.cpp CUDA error (again ! :-))

Hi, I've tried SD 1.4 and CUDA backend on 2 configurations :

On my personal computer with RTX 4070, everything works well, thanks to ag2s20150909 and the build https://github.com/ag2s20150909/stable-diffusion.cpp/releases/tag/master-74a21a7
On my working computer, a laptop with RTX A1000, I still get errors that I don't understand :

ggml_cuda_compute_forward: GET_ROWS failed CUDA error: no kernel image is available for execution on the device current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174

Here is the full log :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_CUDA_2025_01_01>"sd.exe" -m "..\StableDiffusion 1.4 F32\sd-v1-4.ckpt" -p "a cute cat" --sampling-method euler --steps 10 -W 512 -H 512 -s 42 -t 20 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6, VMM: yes [INFO ] stable-diffusion.cpp:195 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt' [INFO ] model.cpp:891 - load ..\StableDiffusion 1.4 F32\sd-v1-4.ckpt using checkpoint format ZIP 0, name = archive/data.pkl, dir = archive/ [INFO ] stable-diffusion.cpp:242 - Version: SD 1.x [INFO ] stable-diffusion.cpp:275 - Weight type: f32 [INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f32 [INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f32 [INFO ] stable-diffusion.cpp:278 - VAE weight type: f32 |==================================================| 1131/1131 - 0.00it/s←[KKKK [INFO ] stable-diffusion.cpp:516 - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:520 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt' completed, taking 9.06s [INFO ] stable-diffusion.cpp:550 - running in eps-prediction mode [INFO ] stable-diffusion.cpp:682 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1235 - apply_loras completed, taking 0.00s ggml_cuda_compute_forward: GET_ROWS failed CUDA error: no kernel image is available for execution on the device current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174 err D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:70: CUDA error

Does anybody have an idea ?

Thanks a lot in advance

Olivier

Jan 07 '25 12:01 olivbrau

Issue with Prebuilt CUDA SD Binary Size Discrepancy

I've noticed a potential issue with the prebuilt CUDA version of stable-diffusion.cpp:

Current release after Nov 30: Download URL: https://github.com/leejet/stable-diffusion.cpp/releases/download/master-dcf91f9/sd-master-dcf91f9-bin-win-cuda12-x64.zip File size: 20.2MB Status: Appears incomplete/incorrect

Previous release (Nov 23): File size: 137MB Status: Functioned correctly

This significant size reduction (approximately 85% smaller) suggests that the latest prebuilt binary might be missing essential components or was incorrectly packaged. The properly functioning version should be closer to the 137MB size of the November 23rd release.

Recommendation: Consider using the November 23rd release until this issue is investigated and resolved, or build from source if possible.

Jan 08 '25 02:01 ClarkChin08

In fact, the release of 23 Nov is much bigger than the releases of december. But I've tried and still get another error :

CUDA error: the provided PTX was compiled with an unsupported toolchain. current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:2326 err D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:102: CUDA error

And moreover, it doesn't explain, why the version of december works well on my other computer (with RTX 4070)

(In fact, it is Flux1Dev that works on this other computer, and the error I mentioned here on the other computer, coincerns SD1.4 (since I've not enough VRAM to Run FluxDev), so my comparison is not perfect)

Jan 08 '25 12:01 olivbrau

RTX A1000 is Ampere GPU architecture source，Maybe you need to change it from 89 to 87. -DCMAKE_CUDA_ARCHITECTURES=89-real to -DCMAKE_CUDA_ARCHITECTURES=87-real or build it in your local machine.

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list

Jan 11 '25 02:01 ag2s20150909

It seems to be caused by ggml upstream:https://github.com/ggerganov/ggml/commit/77d37f5a4efb9d33f808a5d750d11e928e8387cf This commit https://github.com/leejet/stable-diffusion.cpp/commit/c3eeb669cd9b942ec4b66ed3d0bf28b42b1f9c28 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

Jan 11 '25 02:01 ag2s20150909

It seems to be caused by ggml upstream:ggerganov/ggml@77d37f5 This commit c3eeb66 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

so how should we solve this issue?

Jan 17 '25 13:01 icebearlala

Maby need to insert "86" to CMAKE_CUDA_ARCHITECTURES. I also had same error with RTX-3060, and came to this issue.

https://github.com/leejet/stable-diffusion.cpp/blob/master/.github/workflows/build.yml#L166

https://developer.nvidia.com/cuda-gpus

May 21 '25 14:05 AonekoSS

Maby need to insert "86" to CMAKE_CUDA_ARCHITECTURES.

works on rtx 30 /Ampere. (after assembly with the required option)

https://github.com/Farori/stable-diffusion.cpp_Ampere_rtx_30/actions/runs/15738564869

Jun 18 '25 17:06 Farori

This should be fixed.

Nov 13 '25 14:11 leejet