llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
Based on: https://github.com/qwopqwop200/GPTQ-for-LLaMa Current status: Something is busted. The output starts out decent, but quickly degrades into gibberish. This doesn't happen with either the original GPTQ-for-LLaMa using the same weights,...
This fixes bug #292 as suggested [here](https://github.com/ggerganov/llama.cpp/issues/292#issuecomment-1476318351).
If it is not necessary sorted maps, change std::map to std::unordered_map std::unordered_map is a hash table so it should be faster than std::map when storing many items. std::map can be...
In interactive mode, every time the model has to respond to user input it has an increasingly reduced token budget, eventually generating only a few words before stopping. The token...
- On older versions function will silently fail without any ill effects - Only used when params.use_color==true ( --color ) - No windows.h dependency
Some moving around of ANSI color code emissions in recent patches has left us in a situation where RESET codes were getting defensively emitted after every token, resulting in multibyte...
bit of refactoring per https://github.com/ggerganov/llama.cpp/pull/252
NOTE: I am seeing different outputs when running with these changes. They seem of equal quality, but this isn't something I observed when first testing this out on alpaca.cpp. It's...