llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
Be obviously slower with Q_1 30b model. And the memory usage become garbage... (Linux 5.19 x64 Ubuntu base)
Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2. Don't expect any meaningful performance improvement, the function is not very...
This enables `-Wdouble-promotion` and syncs the `Makefile` and `CMakeLists.txt` with regards to warnings. Reasoning: The llama.cpp codebase depends on the correct use of number types, whether those are `float`, `double`...
I can't run any model due to my cpu is from before 2013.So I don't have avx2 instructions.Can you please support avx cpus?
Bloom models have a more permissive license than llama models and are also multilingual in nature. While there is a project [based on llama.cpp](https://github.com/NouamaneTazi/bloomz.cpp) which can perform inference of bloom...
### Discussed in https://github.com/ggerganov/llama.cpp/discussions/446 Originally posted by **cmp-nct** March 24, 2023 I've been testing alpaca 30B (-t 24 -n 2000 --temp 0.2 -b 32 --n_parts 1 --ignore-eos --instruct) I've consistently...
I'm not sure if this is an enhancement request because maybe it's already supported. Is it possible to run the full models? I know they take a ton of extra...
When trying to run './bin/main/ -m ./models/7B/ggml-model-q4_0.bin -n 128' termux throws this output: bash: ./bin/main: permission denied
See explanation here: https://github.com/ggerganov/llama.cpp/pull/439
I already quantized my files with this command ./quantize ./ggml-model-f16.bin.X E:\GPThome\LLaMA\llama.cpp-master-31572d9\models\65B\ggml-model-q4_0.bin.X 2 , the first time it reduced my files size from 15.9 to 4.9Gb and when i tried to...