anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

Support Koboldcpp

Open Botoni opened this issue 1 year ago • 5 comments

Hi, could be possible to support koboldcpp? It is faster and loads more models than lm studio and has a better compatibility with linux.

In fact, it connects using the lm studio option and writing the koboldcpp address, but responses on the chat get truncated at the first or second word, in the koboldcpp terminal the response is fully generated.

Botoni avatar Jan 02 '24 16:01 Botoni

If it helps the implementation, using the lmstudio config, the koboldcpp terminal gives this response when sending a prompt:

Processing Prompt [BLAS] (509 / 509 tokens) Generating (10 / 100 tokens) (EOS token triggered!) ContextLimit: 519/2048, Processing:11.37s (22.3ms/T), Generation:1.19s (118.9ms/T), Total:12.56s (0.80T/s) Output: Hello! How can I help you today?

Botoni avatar Jan 17 '24 22:01 Botoni

I couldn’t connect to kobaldcpp, I thought support for such a convenient backend would be built in right away How can I connect?

intulint avatar Apr 04 '24 00:04 intulint

Koboldcpp says its API is OpenAI compatible. But if I configure LocalAI or LM Studio endpoints to point to Koboldcpp, I get the same truncation experience as the OP. Maybe it is a configuration issue in Koboldcpp?

One motivation I can add for Koboldcpp support, other than it being a really convenient and configurable LLM engine, is that it is the only way to get hardware acceleration for older AMD cards that are not officially supported by ROCm (I have an RX 6600).

zacanbot avatar Apr 17 '24 17:04 zacanbot

https://petstore.swagger.io/?url=https://lite.koboldai.net/kobold_api.json

koboldcpp api ref

kenhuang avatar Apr 27 '24 11:04 kenhuang

If its OpenAI Compatible, cant the Generic OpenAI connector (last LLM connector) work here?

timothycarambat avatar Apr 27 '24 19:04 timothycarambat

Thank you for adding the Koboldcpp connection options. However, can we re-open the issue? The original truncation issue still persists with the latest version of AnythingLLM using the new Koboldcpp connector:

image

While in the Koboldcpp server logs, I see that the whole message is generated:

...
Input: {"model": "koboldcpp/Meta-Llama-3-8B-Instruct-Q5_K_M", "stream": true, "messages": [{"role": "system", "content": "Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed."}, {"role": "user", "content": "hello"}], "temperature": 0.7}

Processing Prompt [BLAS] (61 / 61 tokens)
Generating (100 / 100 tokens)
CtxLimit: 161/8192, Process:0.01s (0.2ms/T = 4066.67T/s), Generate:2.88s (28.8ms/T = 34.76T/s), Total:2.89s (34.58T/s)
Output: Hello! How can I assist you today? What's on your mind?
...

zacanbot avatar May 02 '24 22:05 zacanbot

@shatfield4

timothycarambat avatar May 02 '24 22:05 timothycarambat

@zacanbot Can you give me any more information on how to replicate this bug? I have downloaded the same Llama3 model you are using and the streaming is working fine for me and showing the entire message inside AnythingLLM. Are you running the latest version of KoboldCPP? Did you change any config settings inside KoboldCPP?

shatfield4 avatar May 02 '24 23:05 shatfield4

I just updated to the latest version (1.64) and it seems to be working correctly now! Thanks for digging into this. Appreciated 👍

zacanbot avatar May 03 '24 00:05 zacanbot

image image For some reason it’s not working for me again, I just downloaded a new version of AnythingLLM. Kobaldcpp Version 1.65 Doesn't let me select a model, there's an empty window. Apparently something has disappeared again.

intulint avatar May 24 '24 08:05 intulint

Then this likely is because whatever you have put in as the baseURL is not correct. Does http://localhost:5001/v1/models even return data in the browser?

cc @shatfield4

timothycarambat avatar May 24 '24 19:05 timothycarambat

image image

Yes, the browser opens the link http://localhost:5001/v1/models I also pull out the value in Python using the API. Should the base url be "http://localhost:5001/v1"? This path is written in the tooltip.

intulint avatar May 25 '24 04:05 intulint

image image

Yes, the browser opens the link http://localhost:5001/v1/models I also pull out the value in Python using the API. Should the base url be "http://localhost:5001/v1"? This path is written in the tooltip.

Exact same issue for me. Koboldcpp has a few different api options and none of them are loading with AnythingLLM. But other clients, including koboldai, koboldlite, SillyTavern can all use it without issue.

Negatrev avatar Jun 06 '24 09:06 Negatrev

Exact same issue for me. Koboldcpp has a few different api options and none of them are loading with AnythingLLM. But other clients, including koboldai, koboldlite, SillyTavern can all use it without issue.

У меня получилось обойти эту проблему так: вместо http://localhost:5001/v1 поставил http://127.0.0.1/v1 и все заработало

attyru avatar Jul 11 '24 18:07 attyru