llama.cpp android port of llama.cpp

@ggerganov , can we expect an android port like the whisper one?

Mar 14 '23 08:03 GeorvityLabs

With cmake, it's quite easy to get android binary

$ mkdir build
$ cd build
$ cmake  -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \
   -DANDROID_ABI=arm64-v8a  -DANDROID_PLATFORM=android-30 \
   -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
$ make

On recent flagship Android devices, run ./llama -m models/7B/ggml-model-q4_0.bin -t 4 -n 128 , you should get ~ 5 tokens/second.

 # ./llama -m models/7B/ggml-model-q4_0.bin  -t 4 -n 128 -p "The first man on the moon"                                                                                             
main: seed = 1678784568
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'The first man on the moon'
main: number of tokens in prompt = 7
     1 -> ''
  1576 -> 'The'
   937 -> ' first'
   767 -> ' man'
   373 -> ' on'
   278 -> ' the'
 18786 -> ' moon'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The first man on the moon Neil Armstrong dies at 82
Neil Amstrong, whose space mission made him an American hero and international icon for a generation of children who came to believe he was "the nicest man in history" (AP) [end of text]


main: mem per token = 14368644 bytes
main:     load time =  3966.54 ms
main:   sample time =    84.84 ms
main:  predict time = 12131.24 ms / 220.57 ms per token
main:    total time = 16974.94 ms

Mar 14 '23 09:03 freedomtan

@freedomtan , i was talking about something with a simple UI, like interactive mode. Where you can input the main prompt "you are an assistant etc" and then start chatting.

Mar 14 '23 12:03 GeorvityLabs

See PR #130 on how to build and run with termux

Mar 14 '23 13:03 rgerganov

@GeorvityLabs that is a full blown app not a port at that point.

Mar 15 '23 08:03 G2G2G2G

@GeorvityLabs that is a full blown app not a port at that point.

Yes. Sort of like the one for whisper.cpp

Mar 15 '23 09:03 GeorvityLabs

@GeorvityLabs that is a full blown app not a port at that point.

I'll try to write up something

Mar 15 '23 09:03 GeorvityLabs

With cmake, it's quite easy to get android binary

Unless of course attempting to do so just gives a nice output full of errors, and every time you fix one, another appears... any suggestions?

Mar 17 '23 06:03 sharpy66

We made a flutter app if it can help :) https://github.com/Bip-Rep/sherpa Have fun

Mar 27 '23 18:03 ThibautLEAUX

Please share the bin :)

Apr 12 '23 06:04 NoNamedCat

@NoNamedCat I have releases on my git where you can find the apk. https://github.com/Bip-Rep/sherpa/releases

Apr 12 '23 07:04 ThibautLEAUX

On recent flagship Android devices, run ./llama -m models/7B/ggml-model-q4_0.bin -t 4 -n 128 , you should get ~ 5 tokens/second.

@freedomtan Before this step, how can I install llama on an Android device? Is it as simple as copying a file named llama from somewhere else to the Android device, and then run the ./llama command?

Mar 30 '24 12:03 jo-elimu

llama.cpp llama.cpp copied to clipboard

android port of llama.cpp

llama.cpp
llama.cpp copied to clipboard