continue
continue copied to clipboard
When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens
Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that reports the same bug
- [X] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: macOS
- Continue: 0.8.1
Description
Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).
To reproduce
- Ask a question from the LLM served by
llama.cpp. - When it starts streaming, reject the answer via
CMD-BACKSPACEor your custom shortcut. - The
llama.cppserver keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.
Log output
No response
Same for me with LM Studio
@D0ctorWh0 @ibehnam Working on solving this today - it aligned with a few larger changes we needed to make which is why it's not just being fixed immediately. Hope to update very soon!
@D0ctorWh0 @ibehnam I forgot to update when we fixed this, but it has now been fixed for 1-2 weeks. Let me know if you see any further problems, happy to re-open this if needed!