End-to-End-LLM Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models

Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models

Open aswkumar99 opened this issue 1 year ago • 0 comments

An important aspect of deployment would be that the model needs to be served to a wide range of users. Understanding the throughout and latency and comparison with additional optimisation to the Vanilla deployment could be helpful to get a better picture of the Deployment requirements and perspective.

Feb 19 '24 15:02 aswkumar99

End-to-End-LLM End-to-End-LLM copied to clipboard

Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models

End-to-End-LLM
End-to-End-LLM copied to clipboard