llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Android OpenCL question

Open anthonyliot opened this issue 1 year ago • 3 comments

Hi,

I was able to build a version of Llama using clblast + llama on Android. I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

When running it seems to be working even if the output look weird and not matching the question but at least I have some response. I can have some error in my default params for sure ;)

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

Thanks

anthonyliot avatar Feb 21 '24 03:02 anthonyliot

Hi,

I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf

Unclear, but this doesn't seem to be the focus of your question.

My issues is even if I can detect I am initializing OpenCL it always run on CPU. How I can enforce to run on GPU / OpenCL backend ?

Offloading to GPU requires the -ngl parameter, i.e. ./main ~/model.gguf -ngl 50. ReadMe

Also I see in thecae the KF16 support is commented for OpenCL any reason ?

I think f16 is not supported on Android, maybe someone will confirm.

Jeximo avatar Feb 21 '24 05:02 Jeximo

@Jeximo thanks for you response, yes I use the n-GPU-layers on the params for using GPU. I used 64 in my test.

for KP16 the extension on my device exist. Typically the KHR_FP16 is inside the list but the code to detect the extension is commented in the main branch.

on android using debug I can confirm the opencl_init is called and after removing the comment on the extension, the bool flag for the fp16 is true

But the backend remain on CPU

anthonyliot avatar Feb 21 '24 06:02 anthonyliot

I finally found my problem, the ngl parameter was not pass correctly in the JNI code and was 0 because of that the device use CPU. Now I have a SIGBUS error but there is progress :)

Thanks for your help.

anthonyliot avatar Feb 21 '24 19:02 anthonyliot

How is the acceleration effect of OpenCL compared with CPU backend on Android?@anthonyliot Thansk.

Jimskns avatar Mar 04 '24 11:03 Jimskns

Hi @Jimskns

So the performance are not that great in OpenCL on the android device I tested. All of them was using Qualcomm OpenCL Driver. Also I made a mistake and FP16 is not supported on the device I tried.

I tried 1B / 3B / 7B on devices, and every time the CPU backend perform better. I play with mixing CPU/GPU also full GPU (not for the 7B) but in all my test for now CL is slower.

I am still looking into it to see if there is a way to get improvement on GPU

anthonyliot avatar Mar 04 '24 18:03 anthonyliot

Good Job@anthonyliot

Jimskns avatar Mar 05 '24 03:03 Jimskns

@anthonyliot Hello! I am also trying to use Android OpenCL recently, I also encountered the same problem as you, when using gMl-model-q4_0.gguf inference model on Android device (Qualcomm) GPU can get confusing output, I was wondering how you finally solved this problem? Looking forward to your reply, thank you very much

qtyandhasee avatar Apr 05 '24 12:04 qtyandhasee

Pointing another thread discussing this topic: #7016

gustrd avatar May 11 '24 23:05 gustrd

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Jun 26 '24 02:06 github-actions[bot]