End-to-End-LLM icon indicating copy to clipboard operation
End-to-End-LLM copied to clipboard

Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models

Open aswkumar99 opened this issue 1 year ago • 0 comments

An important aspect of deployment would be that the model needs to be served to a wide range of users. Understanding the throughout and latency and comparison with additional optimisation to the Vanilla deployment could be helpful to get a better picture of the Deployment requirements and perspective.

aswkumar99 avatar Feb 19 '24 15:02 aswkumar99