llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
# Environment and Context Hello, Before jumping to the subject, here's the environnement I'm working with: - Windows 10 - Llama-13b-4bit-(GPTQ quantized) model - Intel® Core™ i7-10700K [AVX | AVX2...
Hello, Your [windows binaries releases](https://github.com/ggerganov/llama.cpp/releases) have probably been built with MSVC and I think there's a better way to do it. # Expected Behavior I have a Intel® Core™ i7-10700K...
Now that we have infinite transcription mode. Would it be possible to dump tokens into file and load them back next time you run llama.cpp to resume conversation? Although it...
Add support for reading older model files so that people do not have to throw out ggml alpaca models.
I want to use prompt from file using `-f` options and alpaca models. Nevertheless, when I use like that, the llama.cpp first prints out the whole input. How to avoid...
Do not insert a "newline" token if user inputs empty line. This let's user to continue the output after she has been asked by reverse prompt for more data. Otherwise...
To avoid code duplication when implementing additional quantization formats (#456), refactor the `forward_mul_mat` and `forward_get_rows` functions to use a table of function pointers, indexed by `ggml_type`. This makes some functions...
# Prerequisites Please answer the following questions for yourself before submitting an issue. - [ :white_check_mark: ] I am running the latest code. Development is very rapid so there are...
After ctx > 2048 or whatever set in -c, While close the terminal, the transcript may have a chance continuously running in system. Linux amd64 5.19 ubuntu base.
implement support for running models that use Llama adapter https://github.com/ZrrSkywalker/LLaMA-Adapter described here how to get the model https://github.com/ZrrSkywalker/LLaMA-Adapter#inference