Andrei
Andrei
With the C API now merged it would be very useful to have build targets for `make` and `cmake` that produce shared library versions of `llama.cpp`. This way `llama.cpp` can...
The goal of this feature is to reduce latency for repeated calls to the chat_completion api by saving the kv_cache keyed by the prompt tokens. The basic version of this...
Support for Nat Friedman's Openplayground project via the OpenAI api server. You can currently test this with ``` docker run --rm --name openplayground -e OPENAI_API_BASE= -p 5432:5432 --volumne openplayground:/web/config natorg/openplayground...
Need to get structured information upfront, can still leave feature requests / etc free form.
Currently this comes up as an AssertionError which leads to a lot of confusion
Certain tokens in the vocabulary cannot be decoded to valid utf-8, I'm actually not sure if this is because they represent partial utf codepoints, but in any case they cause...
Allow the user to alias their local models to OpenAI model names as many tools have those hard-coded. This may cause unexpected issues with tokenization mismatches.