truss-examples
truss-examples copied to clipboard
Adding TRT-LLM + Triton truss
Overview
This PR adds support for Triton + TRT-LLM engines. We allow users to define a Huggingface repository for the pre-built engines and tokenizers. We leverage the C++ TRT runtime and the Triton Inference Server to provide high-performance model serving with streaming enabled.