text-generation-inference
text-generation-inference copied to clipboard
Proper way for client to stop generate_stream
Feature request
Since the /generate_stream provide Token streaming using Server-Sent Events (SSE), the client cannot tell the server to stop the streaming.
Motivation
Sometime, when requesting very large output, we can see from the start we need to stop. The client can close the connection, but I do not know from the current implementation, if this is the best way to handle it.
Your contribution
Maybe adding a unique ID, returned first by the /generate_stream, which could be use to call a new endpoint, (something like /stop_streaming, which could then send a final message with a stop creteria 'manual' or something like that.