fastertransformer_backend How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

Open songkq opened this issue 2 years ago • 1 comments

In a production environment like ChatGPT, early termination of a conversation based on user-client commands can be a major requirement. I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? Could you please give some advice?

with grpcclient.InferenceServerClient(self.model_url) as client:
        client.start_stream(callback=partial(stream_callback, result_queue))
        client.async_stream_infer(self.model_name, request_data)

Jun 01 '23 02:06 songkq

async_stream_infer maybe need a package_input?

Jul 19 '23 01:07 bigmover

fastertransformer_backend fastertransformer_backend copied to clipboard

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

fastertransformer_backend
fastertransformer_backend copied to clipboard