server python tritonclient stream_infer should send end signal to callback

python tritonclient stream_infer should send end signal to callback

Open Jackiexiao opened this issue 2 years ago • 10 comments

this maybe a new baby question, I think we should add self._callback(result=None, error=None) after finish iteration on responses to let callback know stream response is complete and no more response will be sended. Am I right?

python triton client code

https://github.com/triton-inference-server/client/blob/d8bc5b8d4a9297e8e6b470fe96ef82a4470234a2/src/python/library/tritonclient/grpc/init.py#L2083

    def _process_response(self, responses):
        """Worker thread function to iterate through the response stream and
        executes the provided callbacks.

        Parameters
        ----------
        responses : iterator
            The iterator to the response from the server for the
            requests in the stream.

        """
        try:
            for response in responses:
                if self._verbose:
                    print(response)
                result = error = None
                if response.error_message != "":
                    error = InferenceServerException(msg=response.error_message)
                else:
                    result = InferResult(response.infer_response)
                self._callback(result=result, error=error)
            ############ add here ##############
            self._callback(result=None, error=None) # new code
        except grpc.RpcError as rpc_error:
            error = get_error_grpc(rpc_error)
            self._callback(result=None, error=error)

Jun 23 '22 05:06 Jackiexiao

oh.... triton grpc use bidirectional stream, not one request, stream response, so no need to add end signal

response_sender.send(flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL)

so, how can I get end signal for decouple mode ( one request, multi response)

Jun 23 '22 08:06 Jackiexiao

for one request, multi response Scenario, it looks like I should need to send a None indicate send done,

        client = grpcclient.InferenceServerClient(url, verbose=False)
        client.start_stream(callback=partial(callback, user_data))
        client.async_stream_infer(
            model_name=model_name,
            inputs=triton_inputs,
            request_id=rand_int_string(),
            outputs=triton_outputs,
        )
       ############ add here ##############
        client._stream._enqueue_request(None)

so then I can use

            ############ add here ##############
            self._callback(result=None, error=None) # new code

to get a None indicate receive completed

Jun 23 '22 08:06 Jackiexiao

Great question! Is there a specific reason you want an end signal? You can always keep a count of expected responses vs responses from the caller like done in the decoupled examples here: https://github.com/triton-inference-server/python_backend/tree/r22.05/examples/decoupled. I don't believe end signals are generally used, except for sequence batching and the like, and that's the user sending end signals to denote that the sequence is done.

Jun 23 '22 17:06 dyastremsky

CC: @Tabrizian. I'm assuming the code sending the requests would know the amount of expected responses, but perhaps there is a case they wouldn't.

Jun 23 '22 17:06 dyastremsky

for example, streaming text to speech (using decouple mode), the client won't know the amount of expected responses

Jun 24 '22 01:06 Jackiexiao

@Jackiexiao This sounds like a reasonable request to me. I think we should deliver the flag to the client indicating whether this is the last response or not. Right now, it looks like the client must know before hand the number of responses that the model is going to send. @tanmayv25 / @jbkyang-nvi What do you think?

Jun 24 '22 14:06 Tabrizian

We can enable this as an optional feature to the clients.

Jul 07 '22 01:07 tanmayv25

Wonderful. I have filed a ticket to track this feature request.

Jul 07 '22 17:07 dyastremsky

any update?

Aug 26 '22 10:08 Jackiexiao

Not yet. It's in our queue.

Aug 26 '22 17:08 dyastremsky

any update?

Aug 08 '23 11:08 dkalinowski

Yes, this was added here. It should be available from 23.06 onwards.

Thank you for the reminder. Closing this issue.

Aug 08 '23 17:08 dyastremsky

server server copied to clipboard

python tritonclient stream_infer should send end signal to callback

server
server copied to clipboard