cortex.cpp icon indicating copy to clipboard operation
cortex.cpp copied to clipboard

bug: closing http stream does not stop inference

Open fuzetsu opened this issue 1 year ago • 1 comments

Cortex version

Whatever Jan v0.5.7 uses

Describe the Bug

I am using Jan and opening a local OpenAI compatible server using the GUI. Hopefully it's OK to report here even though I'm using cortex.cpp through Jan, I suppose the issue could be specific to Jan's integration.

When I access this API in my own application and try to implement "stop generating" functionality by using an AbortController on my fetch request the server keeps churning until generation stops even though the stream ended long ago.

Steps to Reproduce

  1. start http server
  2. initiate streaming chat completion request
  3. close the http connection
  4. inference continues (going off continued high GPU usage)

Screenshots / Logs

No response

What is your OS?

  • [X] MacOS
  • [ ] Windows
  • [ ] Linux

What engine are you running?

  • [X] cortex.llamacpp (default)
  • [ ] cortex.tensorrt-llm (Nvidia GPUs)
  • [ ] cortex.onnx (NPUs, DirectML)

fuzetsu avatar Oct 25 '24 01:10 fuzetsu

This issue is fixed right? cc @vansangpfiev

louis-jan avatar Apr 11 '25 17:04 louis-jan