TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Increase chunk size while streaming

Open avianion opened this issue 1 month ago • 1 comments

Is it possible to increase the amount of tokens sent per chunk during the streaming process and how to do so?

This could also be with triton-inference-server

avianion avatar May 17 '24 13:05 avianion