anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

Incomplete response from LM studio endpoint

Open unseensholar opened this issue 1 year ago • 7 comments

I am getting incomplete responses while using the LM studio endpoint. The response cuts off midway while streaming, sometimes after the first word or after half a sentence. I am running on docker.

unseensholar avatar Dec 21 '23 07:12 unseensholar

Hi @Daniel-Dan-Espinoza, which model are you using in LM Studio? This typically happens when models that are less optimized for chatting are being used. Also, are you running LM Studio locally on the same machine as your AnythingLLM docker container?

shatfield4 avatar Dec 21 '23 18:12 shatfield4

I was using Starling model. I am running both LM Studio and AnythingLLM docker container on the same machine.

unseensholar avatar Dec 22 '23 17:12 unseensholar

I tried Local AI but have the same issue

unseensholar avatar Dec 22 '23 17:12 unseensholar

When interacting with LM Studio we leave the entire run of inferencing on the LM Studio side. We simply pass along the input and wait for LM Studio to be done with output.

When an inferencing is running Its likely the output being sent to AnythingLLM is not being dropped, but that LM Studio stops generating output and AnythingLLM assumes the response to be done.

Can you ensure that the model is not continuing to generate response when AnythingLLM says the response is complete? This would help determine if the issue is with the model/config on LM Studio or AnythingLLM

timothycarambat avatar Dec 26 '23 18:12 timothycarambat

@timothycarambat this is the streaming bug fix for localai we added. This is the fix working but we need to learn why its dropping the packets

lunamidori5 avatar Dec 29 '23 10:12 lunamidori5

@lunamidori5 We would need to confirm that the user is running the patched version, and if so then yes for sure need to see why. To be fair, I have yet to replicate this issue with LocalAi (or LM Studio for that matter)

timothycarambat avatar Dec 29 '23 17:12 timothycarambat

@timothycarambat at least im not the only one with this bug (I am starting to think it maybe the way some routers work...)

lunamidori5 avatar Dec 29 '23 17:12 lunamidori5

Closing as stale

timothycarambat avatar Jan 16 '24 22:01 timothycarambat

I am seeing this problem, using the latest version of AnythingLLM (0.2.0?). I saw it when using LM Studio, but then it seemed to clear up on its own, or maybe it was after I reset the chat in AnythingLLM. Then I got the empty content complaint from LM Studio, and I decided enough is enough, and I switched to Kobold. Now I am seeing the one token problem using Kobold, via the Local AI LLM setting in AnythingLLM (chat model selection: (which I can't seem to copy from the form, sigh...) koboldcpp/dolphin-2.2.1-mistral-7b.Q5_K_S). Resetting the chat history doesn't help.

I'm fine with disabling the streaming mode, for now. I don't see any way to do that, either in AnythingLLM or Kobold.

Looks like this might be a relevant issue: https://github.com/LostRuins/koboldcpp/issues/669 So it may be a bug on the Kobold side.

dlaliberte avatar Feb 11 '24 18:02 dlaliberte

@timothycarambat could you add a "no streaming" check mark to the llm screen?

lunamidori5 avatar Feb 12 '24 06:02 lunamidori5

Is it because there is an issue with streaming or because certain models do not support it?

timothycarambat avatar Feb 12 '24 17:02 timothycarambat

Is it because there is an issue with streaming or because certain models do not support it?

With Kobold, I was seeing the whole stream of tokens being generated, so clearly the model supports streaming, and Kobold supports streaming, but from the AnythingLLM side, it was already done after the first token came in. So it seems like a timeout issue, not waiting long enough for the next token? It seems there should be a special signal that the stream is finished because, otherwise, how would anyone know?

dlaliberte avatar Feb 13 '24 16:02 dlaliberte

Is it because there is an issue with streaming or because certain models do not support it?

LocalAI and Google Gem Still have that streaming bug from before...

lunamidori5 avatar Feb 14 '24 13:02 lunamidori5