cortex.cpp
cortex.cpp copied to clipboard
bug: closing http stream does not stop inference
Cortex version
Whatever Jan v0.5.7 uses
Describe the Bug
I am using Jan and opening a local OpenAI compatible server using the GUI. Hopefully it's OK to report here even though I'm using cortex.cpp through Jan, I suppose the issue could be specific to Jan's integration.
When I access this API in my own application and try to implement "stop generating" functionality by using an AbortController on my fetch request the server keeps churning until generation stops even though the stream ended long ago.
Steps to Reproduce
- start http server
- initiate streaming chat completion request
- close the http connection
- inference continues (going off continued high GPU usage)
Screenshots / Logs
No response
What is your OS?
- [X] MacOS
- [ ] Windows
- [ ] Linux
What engine are you running?
- [X] cortex.llamacpp (default)
- [ ] cortex.tensorrt-llm (Nvidia GPUs)
- [ ] cortex.onnx (NPUs, DirectML)
This issue is fixed right? cc @vansangpfiev