llama.cpp Android OpenCL question

Hi,

I was able to build a version of Llama using clblast + llama on Android. I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

When running it seems to be working even if the output look weird and not matching the question but at least I have some response. I can have some error in my default params for sure ;)

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

Thanks

Feb 21 '24 03:02 anthonyliot

Hi,

I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

Unclear, but this doesn't seem to be the focus of your question.

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Offloading to GPU requires the -ngl parameter, i.e. ./main ~/model.gguf -ngl 50. ReadMe

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

I think f16 is not supported on Android, maybe someone will confirm.

Feb 21 '24 05:02 Jeximo

@Jeximo thanks for you response, yes I use the n-GPU-layers on the params for using GPU. I used 64 in my test.

for KP16 the extension on my device exist. Typically the KHR_FP16 is inside the list but the code to detect the extension is commented in the main branch.

on android using debug I can confirm the opencl_init is called and after removing the comment on the extension, the bool flag for the fp16 is true

But the backend remain on CPU

Feb 21 '24 06:02 anthonyliot

I finally found my problem, the ngl parameter was not pass correctly in the JNI code and was 0 because of that the device use CPU. Now I have a SIGBUS error but there is progress :)

Thanks for your help.

Feb 21 '24 19:02 anthonyliot

How is the acceleration effect of OpenCL compared with CPU backend on Android?@anthonyliot Thansk.

Mar 04 '24 11:03 Jimskns

Hi @Jimskns

So the performance are not that great in OpenCL on the android device I tested. All of them was using Qualcomm OpenCL Driver. Also I made a mistake and FP16 is not supported on the device I tried.

I tried 1B / 3B / 7B on devices, and every time the CPU backend perform better. I play with mixing CPU/GPU also full GPU (not for the 7B) but in all my test for now CL is slower.

I am still looking into it to see if there is a way to get improvement on GPU

Mar 04 '24 18:03 anthonyliot

Good Job@anthonyliot

Mar 05 '24 03:03 Jimskns

@anthonyliot Hello! I am also trying to use Android OpenCL recently, I also encountered the same problem as you, when using gMl-model-q4_0.gguf inference model on Android device (Qualcomm) GPU can get confusing output, I was wondering how you finally solved this problem? Looking forward to your reply, thank you very much

Apr 05 '24 12:04 qtyandhasee

Pointing another thread discussing this topic: #7016

May 11 '24 23:05 gustrd

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 26 '24 02:06 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Android OpenCL question

llama.cpp
llama.cpp copied to clipboard