stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal

Open shakfu opened this issue 3 weeks ago • 5 comments

Git commit

  • stable-diffusion.cpp version: master-387-e4c50f1
  • ggml submodule version: 55bc9320a4aae82af18e23eefd5de319a755d7b9 (Nov 24, 2025)

Operating System & Version

macOS 15.7

GGML backends

Metal

Command-line arguments used

sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

Steps to reproduce

  1. Build stable-diffusion.cpp with Metal enabled:
git clone --branch master-390-edf2cb3 --depth=1 --recursive --shallow-submodules https://github.com/leejet/stable-diffusion.cpp.git
cd stable-diffusion
mkdir build
cd build
cmake .. -DSD_METAL=ON
  1. Run image generation:
% ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: using embedded metal library
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: loaded in 0.012 sec
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU name:   Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup reduction   = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup matrix mul. = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has unified memory    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has bfloat            = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use residency sets    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use shared buffers    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: recommendedMaxWorkingSetSize  = 11453.25 MB
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: allocating
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: found device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: picking default device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use bfloat         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use fusion         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use concurrency    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use graph optimize = true
[INFO ] stable-diffusion.cpp:227  - loading model from 'sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:382  - load sd_xl_turbo_1.0.q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:318  - Version: SDXL
[INFO ] stable-diffusion.cpp:346  - Weight type stat:                      f16: 150  |    q8_0: 2491
[INFO ] stable-diffusion.cpp:347  - Conditioner weight type stat:         q8_0: 713
[INFO ] stable-diffusion.cpp:348  - Diffusion model weight type stat:      f16: 74   |    q8_0: 1606
[INFO ] stable-diffusion.cpp:349  - VAE weight type stat:                  f16: 76   |    q8_0: 172
[WARN ] stable-diffusion.cpp:591  - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
  |==================================================| 2641/2641 - 461.39it/s
[INFO ] model.cpp:1594 - loading tensors completed, taking 5.73s (process: 0.00s, read: 5.44s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.08s)
[INFO ] stable-diffusion.cpp:782  - total params memory size = 3855.08MB (VRAM 3855.08MB, RAM 0.00MB): text_encoders 835.45MB(VRAM), diffusion_model 2925.17MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:896  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler A method
[INFO ] denoiser.hpp:364  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[ERROR] ggml_extend.hpp:75   - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach.  Try "help target".
No stack.
The program is not being run.
zsh: abort      ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

With debug output added:

METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)

What you expected to happen

Not to throw an error.

What actually happened

Description

When building stable-diffusion.cpp with Metal backend enabled (-DSD_METAL=ON) on macOS, image generation fails with an "unsupported op" error. The operation GGML_OP_DIAG_MASK_INF (op=44) is defined in ggml but not implemented in the Metal backend.

Root Cause

GGML_OP_DIAG_MASK_INF is defined in ggml (include/ggml.h) but is not implemented in the Metal backend. In src/ggml-metal/ggml-metal-device.m, the ggml_metal_device_supports_op() function has no case for this op, so it falls through to default: return false.

This can be verified by checking the https://github.com/ggml-org/ggml/blob/master/src/ggml-metalggml-metal-device.m - there's no case GGML_OP_DIAG_MASK_INF: in the supports_op switch statement.

Impact

This prevents using Metal acceleration with SD models that use attention masking (like SDXL Turbo).

Workaround

Disable Metal backend when building:

cmake .. -DSD_METAL=OFF

This falls back to CPU execution which works correctly but is significantly slower.

Suggested Fix

This issue should likely be reported upstream to https://github.com/ggml-org/ggml to add Metal kernel support for GGML_OP_DIAG_MASK_INF. Alternatively, stable-diffusion.cpp could:

  1. Check if the model requires unsupported ops before attempting Metal execution
  2. Fall back gracefully to CPU for unsupported operations
  3. Document which models/features are compatible with Metal backend

Note: This is ultimately a ggml issue rather than stable-diffusion.cpp specifically. You may want to file this upstream at https://github.com/ggml-org/ggml/issues as well.

Logs / error messages / stack trace

[ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF' ~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op Don't know how to attach. Try "help target". No stack. The program is not being run. zsh: abort ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1


With debug output added:

`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`

### Additional context / environment details

- OS: macOS (Apple Silicon M1)
- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)
- Model: `sd_xl_turbo_1.0.q8_0.gguf` (SDXL Turbo)

shakfu avatar Dec 03 '25 23:12 shakfu

https://github.com/ggml-org/llama.cpp/pull/16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...

stduhpf avatar Dec 04 '25 00:12 stduhpf

Related: #850 and #857

wbruna avatar Dec 04 '25 00:12 wbruna

ggml-org/llama.cpp#16669 This upstream PR would probably fix this issue, but it would first need to be merged in llama.cpp then included in ggml...

This branch seems to have encountered a conflict

baozzz1 avatar Dec 04 '25 13:12 baozzz1

The root cause turned out to be a missing Metal implementation of GGML_OP_DIAG_MASK_INF in ggml, which made the Metal backend reject this op. A Metal kernel and dispatch for this op have now been implemented in ggml so that CLIP attention runs correctly on Apple GPUs and SDXL images match the CPU path when keep_clip_on_cpu = false.

I’ve opened an upstream PR here: https://github.com/ggml-org/ggml/pull/1395

taradaidv avatar Dec 07 '25 22:12 taradaidv

with this patch it work for me:

cd stable-diffusion.cpp/ggml
curl -O -L https://patch-diff.githubusercontent.com/raw/ggml-org/ggml/pull/1395.diff
git apply 1395.diff

calvin2021y avatar Dec 08 '25 06:12 calvin2021y