llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1659 llama.cpp issues
Sort by recently updated
recently updated
newest added

When operating various 7B models (win10, Core I5, GCC64, 8GB, 4 threads) with the same program (relatively indifferent compared between the recent revisions) I found the ggml-vicuna-7b-4bit-rev1.bin and ggml-vicuna-7b-4bit.bin much...

fixes https://github.com/ggerganov/llama.cpp/issues/975

This change allows applying LoRA adapters on the fly without having to duplicate the model files. Instructions: - Obtain the HF PEFT LoRA files `adapter_config.json` and `adapter_model.bin` of a LoRA...

research 🔬

I found this model : [[ggml-vicuna-13b-4bit](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit)](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/main) and judging by their online demo it's very impressive. I tried to run it with llama.cpp latest version - the model loads fine, but...

model
generation quality

Add the code to check the build host to determine the right CPU feature. This is convenient when build Windows version on the machine without AVX2.

As rightly pointed out by @jxy [here](https://github.com/ggerganov/llama.cpp/commit/6232f2d7fd7a22d5eeb62182b2f21fcf01359754#commitcomment-108812025), my changes in #703 limiting the calculation to `int8_t` might overflow. -> Change the types to `int` instead.

Is possible to add a param to allow force show the [end of text] token? like this(i think, don't understand C/C++) ```js if (!embd.empty() && embd.back() == llama_token_eos()) { if...

It sometimes just talks to itself for example: ###Human: Hi ###Assistant: Hello, how can i assist you? (i am runing the latest release with vicuna mode)

Neither of these links works and the files aren't present anymore. > You have to convert it to the new format using [./convert-gpt4all-to-ggml.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-gpt4all-to-ggml.py). You may also need to convert the...