llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
When operating various 7B models (win10, Core I5, GCC64, 8GB, 4 threads) with the same program (relatively indifferent compared between the recent revisions) I found the ggml-vicuna-7b-4bit-rev1.bin and ggml-vicuna-7b-4bit.bin much...
fixes https://github.com/ggerganov/llama.cpp/issues/975
This change allows applying LoRA adapters on the fly without having to duplicate the model files. Instructions: - Obtain the HF PEFT LoRA files `adapter_config.json` and `adapter_model.bin` of a LoRA...
I found this model : [[ggml-vicuna-13b-4bit](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit)](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/main) and judging by their online demo it's very impressive. I tried to run it with llama.cpp latest version - the model loads fine, but...
Add the code to check the build host to determine the right CPU feature. This is convenient when build Windows version on the machine without AVX2.
As rightly pointed out by @jxy [here](https://github.com/ggerganov/llama.cpp/commit/6232f2d7fd7a22d5eeb62182b2f21fcf01359754#commitcomment-108812025), my changes in #703 limiting the calculation to `int8_t` might overflow. -> Change the types to `int` instead.
Is possible to add a param to allow force show the [end of text] token? like this(i think, don't understand C/C++) ```js if (!embd.empty() && embd.back() == llama_token_eos()) { if...
It sometimes just talks to itself for example: ###Human: Hi ###Assistant: Hello, how can i assist you? (i am runing the latest release with vicuna mode)
Neither of these links works and the files aren't present anymore. > You have to convert it to the new format using [./convert-gpt4all-to-ggml.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-gpt4all-to-ggml.py). You may also need to convert the...