continue icon indicating copy to clipboard operation
continue copied to clipboard

When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens

Open ibehnam opened this issue 1 year ago • 2 comments

Before submitting your bug report

Relevant environment info

- OS: macOS
- Continue: 0.8.1

Description

Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).

To reproduce

  1. Ask a question from the LLM served by llama.cpp.
  2. When it starts streaming, reject the answer via CMD-BACKSPACE or your custom shortcut.
  3. The llama.cpp server keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.

Log output

No response

ibehnam avatar Jan 19 '24 20:01 ibehnam

Same for me with LM Studio

D0ctorWh0 avatar Jan 25 '24 20:01 D0ctorWh0

@D0ctorWh0 @ibehnam Working on solving this today - it aligned with a few larger changes we needed to make which is why it's not just being fixed immediately. Hope to update very soon!

sestinj avatar Jan 25 '24 22:01 sestinj

@D0ctorWh0 @ibehnam I forgot to update when we fixed this, but it has now been fixed for 1-2 weeks. Let me know if you see any further problems, happy to re-open this if needed!

sestinj avatar Feb 20 '24 00:02 sestinj