llama.cpp opencl: fix for small models

opencl: fix for small models

Open lhez opened this issue 4 days ago • 0 comments

Currently small models like qwen2.5 0.5B does not work properly with OpenCL backend. This PR fixes this issue. This PR also changes subgroup size to 64 for all Adreno GPUs.

Feb 18 '25 23:02 lhez

llama.cpp llama.cpp copied to clipboard

opencl: fix for small models

llama.cpp
llama.cpp copied to clipboard