llama.cpp
llama.cpp copied to clipboard
opencl: fix for small models
Currently small models like qwen2.5 0.5B does not work properly with OpenCL backend. This PR fixes this issue. This PR also changes subgroup size to 64 for all Adreno GPUs.