llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
This is the output with `-fsanitize=address`: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==167666==ERROR: AddressSanitizer: SEGV on unknown address 0x558c0562c438 (pc 0x558a27cc9807 bp 0x000000000000 sp 0x7ffeb2f57310 T0) ==167666==The signal is caused by a READ...
It'd be useful if there was a way to define tokens that would cause the output to stop prematurely (e.g. for an assistant-style interaction where messages are prefixed with "Assistant:...
Hi, I see that interactive mode has been merged in, I was trying to test the repository on a larger set of weights, and found that there is no output...
I was playing with the 65B model, and it took a minute to read the files. If you wrap the model loader loop with a `#pragma omp parallel for` and...
I think this is an improvement over the current behavior of outputting nothing at all when the prompt is too long. It's slightly ugly to see the truncated prompt in...
I have found that when having a Unicode UTF- emoji char like Unicode Character “👍” (U+1F44D) The prompts breaks up. I'm reading a sample prompt from a text file: ```bash...
The argument parsing for `convert-ckpt-to-ggml.py` is quite ad-hoc and hard to follow. I'm thinking that something around this would go a long way in making the arguments easier to use...
Not much, but has some benefits, - Shorter commands. - Help actual executable files to stand out.