[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal
Git commit
- stable-diffusion.cpp version:
master-387-e4c50f1 - ggml submodule version:
55bc9320a4aae82af18e23eefd5de319a755d7b9(Nov 24, 2025)
Operating System & Version
macOS 15.7
GGML backends
Metal
Command-line arguments used
sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
Steps to reproduce
- Build stable-diffusion.cpp with Metal enabled:
git clone --branch master-390-edf2cb3 --depth=1 --recursive --shallow-submodules https://github.com/leejet/stable-diffusion.cpp.git
cd stable-diffusion
mkdir build
cd build
cmake .. -DSD_METAL=ON
- Run image generation:
% ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
[INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: using embedded metal library
[INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: loaded in 0.012 sec
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU name: Apple M1
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup reduction = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup matrix mul. = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has unified memory = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has bfloat = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use residency sets = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use shared buffers = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: recommendedMaxWorkingSetSize = 11453.25 MB
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: allocating
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: found device: Apple M1
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: picking default device: Apple M1
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: use bfloat = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: use fusion = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: use concurrency = true
[INFO ] ggml_extend.hpp:69 - ggml_metal_init: use graph optimize = true
[INFO ] stable-diffusion.cpp:227 - loading model from 'sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:382 - load sd_xl_turbo_1.0.q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:318 - Version: SDXL
[INFO ] stable-diffusion.cpp:346 - Weight type stat: f16: 150 | q8_0: 2491
[INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: q8_0: 713
[INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f16: 74 | q8_0: 1606
[INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f16: 76 | q8_0: 172
[WARN ] stable-diffusion.cpp:591 - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
|==================================================| 2641/2641 - 461.39it/s
[INFO ] model.cpp:1594 - loading tensors completed, taking 5.73s (process: 0.00s, read: 5.44s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.08s)
[INFO ] stable-diffusion.cpp:782 - total params memory size = 3855.08MB (VRAM 3855.08MB, RAM 0.00MB): text_encoders 835.45MB(VRAM), diffusion_model 2925.17MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:896 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler A method
[INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach. Try "help target".
No stack.
The program is not being run.
zsh: abort ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
With debug output added:
METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)
What you expected to happen
Not to throw an error.
What actually happened
Description
When building stable-diffusion.cpp with Metal backend enabled (-DSD_METAL=ON) on macOS, image generation fails with an "unsupported op" error. The operation GGML_OP_DIAG_MASK_INF (op=44) is defined in ggml but not implemented in the Metal backend.
Root Cause
GGML_OP_DIAG_MASK_INF is defined in ggml (include/ggml.h) but is not implemented in the Metal backend. In src/ggml-metal/ggml-metal-device.m, the ggml_metal_device_supports_op() function has no case for this op, so it falls through to default: return false.
This can be verified by checking the https://github.com/ggml-org/ggml/blob/master/src/ggml-metalggml-metal-device.m - there's no case GGML_OP_DIAG_MASK_INF: in the supports_op switch statement.
Impact
This prevents using Metal acceleration with SD models that use attention masking (like SDXL Turbo).
Workaround
Disable Metal backend when building:
cmake .. -DSD_METAL=OFF
This falls back to CPU execution which works correctly but is significantly slower.
Suggested Fix
This issue should likely be reported upstream to https://github.com/ggml-org/ggml to add Metal kernel support for
GGML_OP_DIAG_MASK_INF. Alternatively, stable-diffusion.cpp could:
- Check if the model requires unsupported ops before attempting Metal execution
- Fall back gracefully to CPU for unsupported operations
- Document which models/features are compatible with Metal backend
Note: This is ultimately a ggml issue rather than stable-diffusion.cpp specifically. You may want to file this upstream at https://github.com/ggml-org/ggml/issues as well.
Logs / error messages / stack trace
[ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF' ~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op Don't know how to attach. Try "help target". No stack. The program is not being run. zsh: abort ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
With debug output added:
`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`
### Additional context / environment details
- OS: macOS (Apple Silicon M1)
- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)
- Model: `sd_xl_turbo_1.0.q8_0.gguf` (SDXL Turbo)
https://github.com/ggml-org/llama.cpp/pull/16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...
Related: #850 and #857
ggml-org/llama.cpp#16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...
This branch seems to have encountered a conflict
The root cause turned out to be a missing Metal implementation of GGML_OP_DIAG_MASK_INF in ggml, which made the Metal backend reject this op.
A Metal kernel and dispatch for this op have now been implemented in ggml so that CLIP attention runs correctly on Apple GPUs and SDXL images match the CPU path when keep_clip_on_cpu = false.
I’ve opened an upstream PR here: https://github.com/ggml-org/ggml/pull/1395
with this patch it work for me:
cd stable-diffusion.cpp/ggml
curl -O -L https://patch-diff.githubusercontent.com/raw/ggml-org/ggml/pull/1395.diff
git apply 1395.diff