llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1654 llama.cpp issues
Sort by recently updated
recently updated
newest added

Be obviously slower with Q_1 30b model. And the memory usage become garbage... (Linux 5.19 x64 Ubuntu base)

bug
performance

Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2. Don't expect any meaningful performance improvement, the function is not very...

This enables `-Wdouble-promotion` and syncs the `Makefile` and `CMakeLists.txt` with regards to warnings. Reasoning: The llama.cpp codebase depends on the correct use of number types, whether those are `float`, `double`...

I can't run any model due to my cpu is from before 2013.So I don't have avx2 instructions.Can you please support avx cpus?

enhancement
hardware
build

Bloom models have a more permissive license than llama models and are also multilingual in nature. While there is a project [based on llama.cpp](https://github.com/NouamaneTazi/bloomz.cpp) which can perform inference of bloom...

enhancement
model

### Discussed in https://github.com/ggerganov/llama.cpp/discussions/446 Originally posted by **cmp-nct** March 24, 2023 I've been testing alpaca 30B (-t 24 -n 2000 --temp 0.2 -b 32 --n_parts 1 --ignore-eos --instruct) I've consistently...

documentation
enhancement

I'm not sure if this is an enhancement request because maybe it's already supported. Is it possible to run the full models? I know they take a ton of extra...

When trying to run './bin/main/ -m ./models/7B/ggml-model-q4_0.bin -n 128' termux throws this output: bash: ./bin/main: permission denied

See explanation here: https://github.com/ggerganov/llama.cpp/pull/439

enhancement

I already quantized my files with this command ./quantize ./ggml-model-f16.bin.X E:\GPThome\LLaMA\llama.cpp-master-31572d9\models\65B\ggml-model-q4_0.bin.X 2 , the first time it reduced my files size from 15.9 to 4.9Gb and when i tried to...