continue When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens

When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens

Open ibehnam opened this issue 1 year ago • 2 comments

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: macOS
- Continue: 0.8.1

Description

Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).

To reproduce

Ask a question from the LLM served by llama.cpp.
When it starts streaming, reject the answer via CMD-BACKSPACE or your custom shortcut.
The llama.cpp server keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.

Log output

No response

Jan 19 '24 20:01 ibehnam

Same for me with LM Studio

Jan 25 '24 20:01 D0ctorWh0

@D0ctorWh0 @ibehnam Working on solving this today - it aligned with a few larger changes we needed to make which is why it's not just being fixed immediately. Hope to update very soon!

Jan 25 '24 22:01 sestinj

@D0ctorWh0 @ibehnam I forgot to update when we fixed this, but it has now been fixed for 1-2 weeks. Let me know if you see any further problems, happy to re-open this if needed!

Feb 20 '24 00:02 sestinj

continue continue copied to clipboard

When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

continue
continue copied to clipboard