llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
> > Hi [@TerryT9](https://github.com/TerryT9), I encountered the same issue and I'm not sure how to resolve it. Could you share how you solved it? > > Hi [@Gianthard-cyh](https://github.com/Gianthard-cyh) , would...
### Name and Version build: 5363 (c2b6fec6) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu (debug) ### Operating systems Linux ###...
在linux上交叉编译之后,在骁龙8gen4上测试报错 linux上编译 ``` cmake .. -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DCMAKE_C_FLAGS="-march=armv8.7a" -DCMAKE_CXX_FLAGS="-march=armv8.7a" -DGGML_OPENMP=OFF -DGGML_LLAMAFILE=OFF -DGGML_QNN=ON -DGGML_QNN_DEFAULT_LIB_SEARCH_PATH=/data/local/tmp ``` 我用的qnn_sdk_version=2.31.0.250130。将高通的动态库push到设备端。 在设备端测试 ``` export LD_LIBRARY_PATH=/data/local/tmp/mllm/install-android/lib:/data/local/tmp/mllm/qnn-lib ./llama-cli -m ../../models/Qwen2.5-0.5B-Instruct-F16.gguf ``` 报错: ``` llama_context: n_ctx_per_seq (4096) <...
I solved this problem through ai. /data/data/com.termux/files/home/llamaqnn/ggml/src/ggml-qnn/utils.cpp:253:17: error: reference to unresolved using declaration 253 | return std::aligned_alloc(alignment, size); | ^ /data/data/com.termux/files/usr/include/c++/v1/cstdlib:150:1: note: using declaration annotated with 'using_if_exists' here 150 |...
### Git commit e36ad89528a0276331e3c22f153d6837c353c5cf ### Operating systems Linux ### GGML backends CPU ### Problem description & steps to reproduce I follow this procedure for build and convert the model into...
### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...
I find something weird about Hexagon-NPU backend performance. Firstly I run `test-backend-ops` on Android 8gen3, the result is normal: ``` MUL_MAT(type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],v=0): 744 runs - 22486.85 us/run - 134.48 MFLOP/run -...
### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...
### Name and Version 测试设备:骁龙8gen3 测试模型:qwen2.5-1.5b-instruct-fp16.gguf ### Operating systems Android ### GGML backends QNN ### Hardware 骁龙8gen3 ### Models _No response_ ### Problem description & steps to reproduce 同标题(分配到qnn_gpu上时可正常推理) ###...