bug: closing http stream does not stop inference

Open fuzetsu opened this issue 1 year ago • 1 comments

Cortex version

Whatever Jan v0.5.7 uses

Describe the Bug

I am using Jan and opening a local OpenAI compatible server using the GUI. Hopefully it's OK to report here even though I'm using cortex.cpp through Jan, I suppose the issue could be specific to Jan's integration.

When I access this API in my own application and try to implement "stop generating" functionality by using an AbortController on my fetch request the server keeps churning until generation stops even though the stream ended long ago.

Steps to Reproduce

start http server
initiate streaming chat completion request
close the http connection
inference continues (going off continued high GPU usage)

Screenshots / Logs

No response

What is your OS?

[X] MacOS
[ ] Windows
[ ] Linux

What engine are you running?

[X] cortex.llamacpp (default)
[ ] cortex.tensorrt-llm (Nvidia GPUs)
[ ] cortex.onnx (NPUs, DirectML)

Oct 25 '24 01:10 fuzetsu

This issue is fixed right? cc @vansangpfiev

Apr 11 '25 17:04 louis-jan