stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

unsupported op 'IM2COL_3D' on Mac

Open debamitro opened this issue 3 months ago • 15 comments

I tried to run Wan2.2-TI2V-5B on a Mac Mini, it hits an assert

ggml/src/ggml-metal/ggml-metal.m:2068: unsupported op 'IM2COL_3D'

is there any way to avoid this problem? I could run Flux dev successfully, by the way

debamitro avatar Sep 22 '25 12:09 debamitro

I believe this is due to IM2COL_3D not being implemented for Metal in GGML yet, similar to what was mentioned in #822 . I'm not sure if it's being worked on for Metal, though, even though llama.cpp has implemented it for Vulkan and at least that should be coming whenever GGML is updated for stable-diffusion.cpp (waiting on it myself).

MrSnichovitch avatar Sep 23 '25 03:09 MrSnichovitch

Great! @MrSnichovitch - do you know what it takes to point to the latest ggml?

debamitro avatar Sep 24 '25 14:09 debamitro

For reference, here is the stack trace I get:

  * frame #0: 0x000000019b7de388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000019b81788c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x000000019b720a3c libsystem_c.dylib`abort + 124
    frame #3: 0x00000001001bd730 sd`ggml_abort + 160
    frame #4: 0x00000001001bad44 sd`ggml_metal_encode_node + 27288
    frame #5: 0x00000001001b4218 sd`__ggml_backend_metal_set_n_cb_block_invoke + 596
    frame #6: 0x00000001001b3cb0 sd`ggml_backend_metal_graph_compute + 368
    frame #7: 0x00000001001d3684 sd`ggml_backend_graph_compute + 32
    frame #8: 0x000000010009c4bc sd`GGMLRunner::compute(std::__1::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) + 648
    frame #9: 0x00000001000bb694 sd`WanModel::compute(int, DiffusionParams, ggml_tensor**, ggml_context*) + 204
    frame #10: 0x000000010010cf7c sd`StableDiffusionGGML::sample(ggml_context*, std::__1::shared_ptr<DiffusionModel>, bool, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, SDCondition, ggml_tensor*, float, sd_guidance_params_t, float, sample_method_t, std::__1::vector<float, std::__1::allocator<float>> const&, int, SDCondition, std::__1::vector<ggml_tensor*, std::__1::allocator<ggml_tensor*>>, bool, ggml_tensor*, ggml_tensor*, float)::'lambda'(ggml_tensor*, float, int)::operator()(ggml_tensor*, float, int) const + 1308
    frame #11: 0x000000010007129c sd`StableDiffusionGGML::sample(ggml_context*, std::__1::shared_ptr<DiffusionModel>, bool, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, SDCondition, ggml_tensor*, float, sd_guidance_params_t, float, sample_method_t, std::__1::vector<float, std::__1::allocator<float>> const&, int, SDCondition, std::__1::vector<ggml_tensor*, std::__1::allocator<ggml_tensor*>>, bool, ggml_tensor*, ggml_tensor*, float) + 3192
    frame #12: 0x000000010007785c sd`generate_video + 7096
    frame #13: 0x000000010000c2e4 sd`main + 3272
    frame #14: 0x000000019b476b98 dyld`start + 6076

debamitro avatar Sep 24 '25 18:09 debamitro

I patched the change mentioned in #822 locally and tried to use the Vulkan backend, now I hit a different error

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x4a)
  * frame #0: 0x000000010022b794 sd`ggml_vk_build_graph(ggml_backend_vk_context*, ggml_cgraph*, int, ggml_tensor*, int, bool, bool, bool, bool) + 280
    frame #1: 0x0000000100225668 sd`ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) + 356
    frame #2: 0x0000000100269d84 sd`ggml_backend_graph_compute + 32
    frame #3: 0x000000010009bcdc sd`GGMLRunner::compute(std::__1::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) + 648
    frame #4: 0x00000001000ab3fc sd`T5CLIPEmbedder::get_learned_condition_common(ggml_context*, int, std::__1::tuple<std::__1::vector<int, std::__1::allocator<int>>, std::__1::vector<float, std::__1::allocator<float>>, std::__1::vector<float, std::__1::allocator<float>>>, int, bool) + 596
    frame #5: 0x00000001000aaa6c sd`T5CLIPEmbedder::get_learned_condition(ggml_context*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, int, int, int, bool) + 168
    frame #6: 0x00000001000769e0 sd`generate_video + 5596
    frame #7: 0x000000010000ba84 sd`main + 3272
    frame #8: 0x000000019b476b98 dyld`start + 6076

debamitro avatar Sep 24 '25 18:09 debamitro

You should note that I barely know what I'm doing here, so I'm hoping someone with more programming experience can chime in. Can MacOS use Vulkan? I thought Metal was its near-neighbor equivalent.

Anyway, I can tell you I only sort of managed to get Vulkan on Linux working by downloading llama.cpp, then completely overwriting the stable-diffusion.cpp ggml folder with llama.cpp's and compiling. It was the only way to not have to chase down errors in the header files during compilation.

Have to run sd with the --offload-to-cpu and --vae-on-cpu options to get it to output anything, but the output's an incoherent mess with the WAN 2.2 TI2V 5B Q8_0 gguf model. WAN 2.1 T2V works much better in Vulkan on my system, but it's slow as hell.

MrSnichovitch avatar Sep 24 '25 19:09 MrSnichovitch

There is a MacOS version of Vulkan, which I downloaded. I do have programming experience but not in this kind of code, so I can only post my observations.

After turning on debug messages the crash seems to be happening because the 'pipeline' in this part of the code is null

                vk_pipeline pipeline = ggml_vk_op_get_pipeline(ctx, src0, src1, src2, node, node->op);
                ggml_pipeline_request_descriptor_sets(ctx, pipeline, 1);

node->op is GET_ROWS - whatever that means

debamitro avatar Sep 25 '25 00:09 debamitro

Sounds like missing quant types for GET_ROWS (as in #851 ). What types do you see in src0->type and src1->type ?

wbruna avatar Sep 25 '25 02:09 wbruna

I am getting the same unsupported op 'IM2COL_3D' error with Linux and Vulkan too.

evcharger avatar Sep 25 '25 05:09 evcharger

@evcharger : Are you running a build with today's source code (master-306-2abe945)? I just tested Vulkan built with it using wan2.1_t2v_1.3B_fp16.safetensors and it's working as expected.

MrSnichovitch avatar Sep 25 '25 07:09 MrSnichovitch

Ok with today's version it works, although the output is gibberish and actually all outputs with today's version with Ubuntu and Vulkan produce gibberish with most models :(

evcharger avatar Sep 25 '25 11:09 evcharger

Ok with today's version it works, although the output is gibberish and actually all outputs with today's version with Ubuntu and Vulkan produce gibberish with most models :(

This is likely #847 , fixed by https://github.com/ggml-org/llama.cpp/commit/9073a73d82a916cea0809de225ef5175c3a86e91 (but still missing from ggml).

wbruna avatar Sep 25 '25 12:09 wbruna

Yes, the original problem has gone away after I sync'ed the code to the latest revision. However, when I ran it my Mac's display started blinking and I killed it out of caution

debamitro avatar Sep 26 '25 01:09 debamitro

My last comment was too quick. I added --offload-to-cpu and then the run proceeded until it hit the same original problem

debamitro avatar Sep 26 '25 01:09 debamitro

https://github.com/CLDawes/ggml/tree/patch-qwen-image

Metal support for IM2COL_3D, DIAG_MASK_INF, and a fix for PAD to make it pass the test suite.

I got QuantStack's Qwen-Image-GGUF (Q6_K) running on a Mac Mini M4 Pro so I wouldn't have to spend money on a graphics card to futz about, and that's all I wanted out of this.

It works in master-331-90ef5f8, but I don't know if Wan takes advantage of any additional unimplemented/buggy operations.

CLDawes avatar Oct 19 '25 16:10 CLDawes

https://github.com/CLDawes/ggml/tree/patch-qwen-image

I guess this would also fix #857 ?

@CLDawes , ggml changes are usually applied to llama.cpp first, then extracted to the lib, then pulled in by sd.cpp; so I suggest you submit that as a llama.cpp PR.

wbruna avatar Oct 19 '25 17:10 wbruna