server
server copied to clipboard
python tritonclient stream_infer should send end signal to callback
this maybe a new baby question, I think we should add self._callback(result=None, error=None)
after finish iteration on responses
to let callback know stream response is complete and no more response will be sended. Am I right?
python triton client code
https://github.com/triton-inference-server/client/blob/d8bc5b8d4a9297e8e6b470fe96ef82a4470234a2/src/python/library/tritonclient/grpc/init.py#L2083
def _process_response(self, responses):
"""Worker thread function to iterate through the response stream and
executes the provided callbacks.
Parameters
----------
responses : iterator
The iterator to the response from the server for the
requests in the stream.
"""
try:
for response in responses:
if self._verbose:
print(response)
result = error = None
if response.error_message != "":
error = InferenceServerException(msg=response.error_message)
else:
result = InferResult(response.infer_response)
self._callback(result=result, error=error)
############ add here ##############
self._callback(result=None, error=None) # new code
except grpc.RpcError as rpc_error:
error = get_error_grpc(rpc_error)
self._callback(result=None, error=error)
oh.... triton grpc use bidirectional stream, not one request, stream response, so no need to add end signal
response_sender.send(flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL)
so, how can I get end signal for decouple mode ( one request, multi response)
for one request, multi response
Scenario, it looks like I should need to send a None
indicate send done,
client = grpcclient.InferenceServerClient(url, verbose=False)
client.start_stream(callback=partial(callback, user_data))
client.async_stream_infer(
model_name=model_name,
inputs=triton_inputs,
request_id=rand_int_string(),
outputs=triton_outputs,
)
############ add here ##############
client._stream._enqueue_request(None)
so then I can use
############ add here ##############
self._callback(result=None, error=None) # new code
to get a None
indicate receive completed
Great question! Is there a specific reason you want an end signal? You can always keep a count of expected responses vs responses from the caller like done in the decoupled examples here: https://github.com/triton-inference-server/python_backend/tree/r22.05/examples/decoupled. I don't believe end signals are generally used, except for sequence batching and the like, and that's the user sending end signals to denote that the sequence is done.
CC: @Tabrizian. I'm assuming the code sending the requests would know the amount of expected responses, but perhaps there is a case they wouldn't.
for example, streaming text to speech (using decouple mode), the client won't know the amount of expected responses
@Jackiexiao This sounds like a reasonable request to me. I think we should deliver the flag to the client indicating whether this is the last response or not. Right now, it looks like the client must know before hand the number of responses that the model is going to send. @tanmayv25 / @jbkyang-nvi What do you think?
We can enable this as an optional feature to the clients.
Wonderful. I have filed a ticket to track this feature request.
any update?
Not yet. It's in our queue.
any update?
Yes, this was added here. It should be available from 23.06 onwards.
Thank you for the reminder. Closing this issue.