llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
I'm not sure if this has a place in the repository. I did a bit of prompt engineering to get a conversation going with LLaMa, this is the script I...
Hey! Is it possible to add a way of dumping the current state into a file, so it can then be reloaded later? This would avoid the time needed to...
what's the supported context window length for each model?
Precompiled files for windows x64 via Cmake.
im running on bare metal nothing emulated ``` littlemac@littlemac:~$` git clone https://github.com/ggerganov/llama.cpp Cloning into 'llama.cpp'... remote: Enumerating objects: 283, done. remote: Counting objects: 100% (283/283), done. remote: Compressing objects: 100%...
When I compile with make, the following error occurs ``` inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’: target specific option mismatch 52 | _mm256_cvtph_ps (__m128i __A) ``` Error will be...
Includes vectorised inference code, quantisation and a counterpart to the Q4_0 multipart fix we introduced a while ago. Tested working up to 13B, though I can't confidently say anything about...
Use RMSNorm
The original paper, and the reference implementation [1] uses RMS norm. However, llama.cpp uses ggml_norm() which looks like Layer norm? The differences between these may not be too obvious, because...
Add disk space requirements from https://cocktailpeanut.github.io/dalai/#/?id=_7b, as suggested in #195.
I was attempting to merge alpaca-lora from https://huggingface.co/tloen/alpaca-lora-7b and the original llama-7B from https://huggingface.co/decapoda-research/llama-7b-hf, also tried to quantize the model and run main file in llama.cpp. The merge code is...