End-to-End-LLM
End-to-End-LLM copied to clipboard
Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models
An important aspect of deployment would be that the model needs to be served to a wide range of users. Understanding the throughout and latency and comparison with additional optimisation to the Vanilla deployment could be helpful to get a better picture of the Deployment requirements and perspective.