anything-llm Incomplete response from LM studio endpoint

I am getting incomplete responses while using the LM studio endpoint. The response cuts off midway while streaming, sometimes after the first word or after half a sentence. I am running on docker.

Dec 21 '23 07:12 unseensholar

Hi @Daniel-Dan-Espinoza, which model are you using in LM Studio? This typically happens when models that are less optimized for chatting are being used. Also, are you running LM Studio locally on the same machine as your AnythingLLM docker container?

Dec 21 '23 18:12 shatfield4

I was using Starling model. I am running both LM Studio and AnythingLLM docker container on the same machine.

Dec 22 '23 17:12 unseensholar

I tried Local AI but have the same issue

Dec 22 '23 17:12 unseensholar

When interacting with LM Studio we leave the entire run of inferencing on the LM Studio side. We simply pass along the input and wait for LM Studio to be done with output.

When an inferencing is running Its likely the output being sent to AnythingLLM is not being dropped, but that LM Studio stops generating output and AnythingLLM assumes the response to be done.

Can you ensure that the model is not continuing to generate response when AnythingLLM says the response is complete? This would help determine if the issue is with the model/config on LM Studio or AnythingLLM

Dec 26 '23 18:12 timothycarambat

@timothycarambat this is the streaming bug fix for localai we added. This is the fix working but we need to learn why its dropping the packets

Dec 29 '23 10:12 lunamidori5

@lunamidori5 We would need to confirm that the user is running the patched version, and if so then yes for sure need to see why. To be fair, I have yet to replicate this issue with LocalAi (or LM Studio for that matter)

Dec 29 '23 17:12 timothycarambat

@timothycarambat at least im not the only one with this bug (I am starting to think it maybe the way some routers work...)

Dec 29 '23 17:12 lunamidori5

Closing as stale

Jan 16 '24 22:01 timothycarambat

I am seeing this problem, using the latest version of AnythingLLM (0.2.0?). I saw it when using LM Studio, but then it seemed to clear up on its own, or maybe it was after I reset the chat in AnythingLLM. Then I got the empty content complaint from LM Studio, and I decided enough is enough, and I switched to Kobold. Now I am seeing the one token problem using Kobold, via the Local AI LLM setting in AnythingLLM (chat model selection: (which I can't seem to copy from the form, sigh...) koboldcpp/dolphin-2.2.1-mistral-7b.Q5_K_S). Resetting the chat history doesn't help.

I'm fine with disabling the streaming mode, for now. I don't see any way to do that, either in AnythingLLM or Kobold.

Looks like this might be a relevant issue: https://github.com/LostRuins/koboldcpp/issues/669 So it may be a bug on the Kobold side.

Feb 11 '24 18:02 dlaliberte

@timothycarambat could you add a "no streaming" check mark to the llm screen?

Feb 12 '24 06:02 lunamidori5

Is it because there is an issue with streaming or because certain models do not support it?

Feb 12 '24 17:02 timothycarambat

Is it because there is an issue with streaming or because certain models do not support it?

With Kobold, I was seeing the whole stream of tokens being generated, so clearly the model supports streaming, and Kobold supports streaming, but from the AnythingLLM side, it was already done after the first token came in. So it seems like a timeout issue, not waiting long enough for the next token? It seems there should be a special signal that the stream is finished because, otherwise, how would anyone know?

Feb 13 '24 16:02 dlaliberte

Is it because there is an issue with streaming or because certain models do not support it?

LocalAI and Google Gem Still have that streaming bug from before...

Feb 14 '24 13:02 lunamidori5

anything-llm anything-llm copied to clipboard

Incomplete response from LM studio endpoint

anything-llm
anything-llm copied to clipboard