stable-diffusion.cpp
                                
                                 stable-diffusion.cpp copied to clipboard
                                
                                    stable-diffusion.cpp copied to clipboard
                            
                            
                            
                        ggml-metal.m - GGML_ASSERT: ne00 % 4 == 0 when generating images of dimensions 640x640
Seems to be an issue with group_norm on metal, haven't tried with other backends.
./bin/sd -m models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors -p "a cat" --steps 2 -H 640 -W 640
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading 'stable-diffusion.cpp/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
[INFO ] stable-diffusion.cpp:142  - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors'
[INFO ] model.cpp:676  - load models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:164  - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:170  - Stable Diffusion weight type: f32
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   469.45 MiB, (  471.33 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  2155.34 MiB, ( 2626.67 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    94.47 MiB, ( 2721.14 / 21845.34)
[INFO ] stable-diffusion.cpp:306  - total params memory size = 1408.32MB (clip 469.44MB, unet 2155.33MB, vae 94.47MB, controlnet 0.00MB)
[INFO ] stable-diffusion.cpp:310  - loading model from 'models/gsdf/Counterfeit-V2.5/Counterfeit-V2.5_pruned.safetensors' completed, taking 1.03s
[INFO ] stable-diffusion.cpp:327  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1374 - apply_loras completed, taking 0.00s
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     1.41 MiB, ( 2722.55 / 21845.34)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     1.41 MiB, ( 2722.55 / 21845.34)
[INFO ] stable-diffusion.cpp:1413 - get_learned_condition completed, taking 69 ms
[INFO ] stable-diffusion.cpp:1429 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1433 - generating image: 1/1 - seed 42
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  1320.61 MiB, ( 3572.30 / 21845.34)
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
GGML_ASSERT: stable-diffusion.cpp/ggml/src/ggml-metal.m:2034: ne00 % 4 == 0
zsh: abort      ./bin/sd -m  -p "a cat" --steps 2 -H 640 -W 640
Also asserts for all square image dimensions except for 512x512, 768x768, and 1024x1024 ( haven't tested past 1024x1024).
This seems to be an issue with the implementation of the ggml Metal backend. You can try removing the corresponding assets to see if the issue persists.
Same assert for me, but on v2-1_768-ema-pruned.safetensors model on M1 Pro. No image dimensions work.
Any suggestions on workaround/fix?
Same issue at all dimensions, after commenting out the assert line in ggml-metal.m it just works with no noticeable difference.