stable-diffusion.cpp [Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF

Git commit

stable-diffusion.cpp version: master-387-e4c50f1
ggml submodule version: 55bc9320a4aae82af18e23eefd5de319a755d7b9 (Nov 24, 2025)

Operating System & Version

macOS 15.7

GGML backends

Metal

Command-line arguments used

sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

Steps to reproduce

Build stable-diffusion.cpp with Metal enabled:

git clone --branch master-390-edf2cb3 --depth=1 --recursive --shallow-submodules https://github.com/leejet/stable-diffusion.cpp.git
cd stable-diffusion
mkdir build
cd build
cmake .. -DSD_METAL=ON

Run image generation:

% ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: using embedded metal library
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: loaded in 0.012 sec
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU name:   Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup reduction   = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup matrix mul. = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has unified memory    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has bfloat            = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use residency sets    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use shared buffers    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: recommendedMaxWorkingSetSize  = 11453.25 MB
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: allocating
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: found device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: picking default device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use bfloat         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use fusion         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use concurrency    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use graph optimize = true
[INFO ] stable-diffusion.cpp:227  - loading model from 'sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:382  - load sd_xl_turbo_1.0.q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:318  - Version: SDXL
[INFO ] stable-diffusion.cpp:346  - Weight type stat:                      f16: 150  |    q8_0: 2491
[INFO ] stable-diffusion.cpp:347  - Conditioner weight type stat:         q8_0: 713
[INFO ] stable-diffusion.cpp:348  - Diffusion model weight type stat:      f16: 74   |    q8_0: 1606
[INFO ] stable-diffusion.cpp:349  - VAE weight type stat:                  f16: 76   |    q8_0: 172
[WARN ] stable-diffusion.cpp:591  - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
  |==================================================| 2641/2641 - 461.39it/s
[INFO ] model.cpp:1594 - loading tensors completed, taking 5.73s (process: 0.00s, read: 5.44s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.08s)
[INFO ] stable-diffusion.cpp:782  - total params memory size = 3855.08MB (VRAM 3855.08MB, RAM 0.00MB): text_encoders 835.45MB(VRAM), diffusion_model 2925.17MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:896  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler A method
[INFO ] denoiser.hpp:364  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[ERROR] ggml_extend.hpp:75   - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach.  Try "help target".
No stack.
The program is not being run.
zsh: abort      ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

With debug output added:

METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)

What you expected to happen

Not to throw an error.

What actually happened

Description

When building stable-diffusion.cpp with Metal backend enabled (-DSD_METAL=ON) on macOS, image generation fails with an "unsupported op" error. The operation GGML_OP_DIAG_MASK_INF (op=44) is defined in ggml but not implemented in the Metal backend.

Root Cause

GGML_OP_DIAG_MASK_INF is defined in ggml (include/ggml.h) but is not implemented in the Metal backend. In src/ggml-metal/ggml-metal-device.m, the ggml_metal_device_supports_op() function has no case for this op, so it falls through to default: return false.

This can be verified by checking the https://github.com/ggml-org/ggml/blob/master/src/ggml-metalggml-metal-device.m - there's no case GGML_OP_DIAG_MASK_INF: in the supports_op switch statement.

Impact

This prevents using Metal acceleration with SD models that use attention masking (like SDXL Turbo).

Workaround

Disable Metal backend when building:

cmake .. -DSD_METAL=OFF

This falls back to CPU execution which works correctly but is significantly slower.

Suggested Fix

This issue should likely be reported upstream to https://github.com/ggml-org/ggml to add Metal kernel support for GGML_OP_DIAG_MASK_INF. Alternatively, stable-diffusion.cpp could:

Check if the model requires unsupported ops before attempting Metal execution
Fall back gracefully to CPU for unsupported operations
Document which models/features are compatible with Metal backend

Note: This is ultimately a ggml issue rather than stable-diffusion.cpp specifically. You may want to file this upstream at https://github.com/ggml-org/ggml/issues as well.

Logs / error messages / stack trace

[ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF' ~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op Don't know how to attach. Try "help target". No stack. The program is not being run. zsh: abort ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1


With debug output added:

`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`

### Additional context / environment details

- OS: macOS (Apple Silicon M1)
- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)
- Model: `sd_xl_turbo_1.0.q8_0.gguf` (SDXL Turbo)

Dec 03 '25 23:12 shakfu

https://github.com/ggml-org/llama.cpp/pull/16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...

Dec 04 '25 00:12 stduhpf

Related: #850 and #857

Dec 04 '25 00:12 wbruna

ggml-org/llama.cpp#16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...

This branch seems to have encountered a conflict

Dec 04 '25 13:12 baozzz1

The root cause turned out to be a missing Metal implementation of GGML_OP_DIAG_MASK_INF in ggml, which made the Metal backend reject this op. A Metal kernel and dispatch for this op have now been implemented in ggml so that CLIP attention runs correctly on Apple GPUs and SDXL images match the CPU path when keep_clip_on_cpu = false.

I’ve opened an upstream PR here: https://github.com/ggml-org/ggml/pull/1395

Dec 07 '25 22:12 taradaidv

with this patch it work for me:

cd stable-diffusion.cpp/ggml
curl -O -L https://patch-diff.githubusercontent.com/raw/ggml-org/ggml/pull/1395.diff
git apply 1395.diff

Dec 08 '25 06:12 calvin2021y

[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Description

Root Cause

Impact

Workaround

Suggested Fix

Logs / error messages / stack trace