llama.cpp
llama.cpp copied to clipboard
Android OpenCL question
Hi,
I was able to build a version of Llama using clblast + llama on Android. I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf
When running it seems to be working even if the output look weird and not matching the question but at least I have some response. I can have some error in my default params for sure ;)
My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?
Also I see in thecae the KF16 support is commented for OpenCL any reason ?
Thanks
Hi,
I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf
Unclear, but this doesn't seem to be the focus of your question.
My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?
Offloading to GPU requires the -ngl
parameter, i.e. ./main ~/model.gguf -ngl 50
. ReadMe
Also I see in thecae the KF16 support is commented for OpenCL any reason ?
I think f16 is not supported on Android, maybe someone will confirm.
@Jeximo thanks for you response, yes I use the n-GPU-layers on the params for using GPU. I used 64 in my test.
for KP16 the extension on my device exist. Typically the KHR_FP16 is inside the list but the code to detect the extension is commented in the main branch.
on android using debug I can confirm the opencl_init is called and after removing the comment on the extension, the bool flag for the fp16 is true
But the backend remain on CPU
I finally found my problem, the ngl parameter was not pass correctly in the JNI code and was 0 because of that the device use CPU. Now I have a SIGBUS error but there is progress :)
Thanks for your help.
How is the acceleration effect of OpenCL compared with CPU backend on Android?@anthonyliot Thansk.
Hi @Jimskns
So the performance are not that great in OpenCL on the android device I tested. All of them was using Qualcomm OpenCL Driver. Also I made a mistake and FP16 is not supported on the device I tried.
I tried 1B / 3B / 7B on devices, and every time the CPU backend perform better. I play with mixing CPU/GPU also full GPU (not for the 7B) but in all my test for now CL is slower.
I am still looking into it to see if there is a way to get improvement on GPU
Good Job@anthonyliot
@anthonyliot Hello! I am also trying to use Android OpenCL recently, I also encountered the same problem as you, when using gMl-model-q4_0.gguf inference model on Android device (Qualcomm) GPU can get confusing output, I was wondering how you finally solved this problem? Looking forward to your reply, thank you very much
Pointing another thread discussing this topic: #7016
This issue was closed because it has been inactive for 14 days since being marked as stale.