OpenLLM
OpenLLM copied to clipboard
how to stop generation stream?
Is there any way to stop generation stream on the model-side if it's no longer needed? For example, the client disconnected or pressed stop.
You can pass in stop
argument on request for the token to be stopped.
I'm talking about a situation where the stream is already generating and the client disconnects or presses stop button.
not sure if I understand this, but if the client disconnects, with vLLM backend the request will be cancelled.