llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

metal: add ops DIAG_MASK_INF, IM2COL_3D, fix op PAD

Open CLDawes opened this issue 2 months ago • 1 comments

I wrote this to fix a problem I was having working with leejet/stable-diffusion.cpp. It may fix issues that other people are having, such as their #850 and #857.

As a user, I've already solved the issue for the audience that I care about. I'm offering this in hopes that it may be more helpful than merely opening an issue to complain about the missing/broken ops and going back to generating images of people with fish for heads walking down sidewalks. Yippee.

This commit is a (manual) octopus-merge of three independent commits, each of which could be converted into their own PR if this is overly broad.

test-backend-ops -b Metal -o IM2COL_3D, test-backend-ops -b Metal -o DIAG_MASK_INF, and test-backend-ops -b Metal -o PAD all passed on:

ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 19069.67 MB

CLDawes avatar Oct 19 '25 21:10 CLDawes

see also, #17175 for CONV_2D

bghira avatar Nov 11 '25 19:11 bghira