llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
I have been experimenting with q4_1 quantisation (since [some preliminary results](https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and) suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken...
See this issue: https://github.com/facebookresearch/llama/pull/73
Following on to the "Store preprocessed prompts", it would be good to be able to take in a text file with a generic prompt & flags to start a chatbot...
Hey, I know someone already posted a similar issue that has already been closed, but I ran into the same thing. On windows 10 and cloned just yesterday
This is for issue #91. Treat this as a first draft. There are definitely some thing that need to be changed and will be changed shortly. I have not benchmarked....
Fixes scanf unused result compile warning.
Adds a parameter called context size (-c for short) that allows taking the context size from the user's input. Defaults to the same hardcoded 512.
Fixes the color messing up the terminal when the program exits by printing an ANSI_COLOR_RESET. Includes it in the SIGINT handler too.
I'm not fully familiar with this codebase, so pardon if I'm wrong. My first attempt to modify the code was to expand hardcoded context window of 512 to 4096 but...
When converting the model + tokenizer, use the vocabulary size returned by the tokenizer rather than assuming 32000. There are ways that special tokens or other new tokens could be...