llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1637 llama.cpp issues
Sort by recently updated
recently updated
newest added

I'm not sure if this has a place in the repository. I did a bit of prompt engineering to get a conversation going with LLaMa, this is the script I...

enhancement

Hey! Is it possible to add a way of dumping the current state into a file, so it can then be reloaded later? This would avoid the time needed to...

enhancement

what's the supported context window length for each model?

model
generation quality

im running on bare metal nothing emulated ``` littlemac@littlemac:~$` git clone https://github.com/ggerganov/llama.cpp Cloning into 'llama.cpp'... remote: Enumerating objects: 283, done. remote: Counting objects: 100% (283/283), done. remote: Compressing objects: 100%...

duplicate
hardware
build

When I compile with make, the following error occurs ``` inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’: target specific option mismatch 52 | _mm256_cvtph_ps (__m128i __A) ``` Error will be...

bug
performance
hardware
build

Includes vectorised inference code, quantisation and a counterpart to the Q4_0 multipart fix we introduced a while ago. Tested working up to 13B, though I can't confidently say anything about...

The original paper, and the reference implementation [1] uses RMS norm. However, llama.cpp uses ggml_norm() which looks like Layer norm? The differences between these may not be too obvious, because...

bug
help wanted
good first issue
high priority

Add disk space requirements from https://cocktailpeanut.github.io/dalai/#/?id=_7b, as suggested in #195.

I was attempting to merge alpaca-lora from https://huggingface.co/tloen/alpaca-lora-7b and the original llama-7B from https://huggingface.co/decapoda-research/llama-7b-hf, also tried to quantize the model and run main file in llama.cpp. The merge code is...

enhancement
help wanted