truss-examples icon indicating copy to clipboard operation
truss-examples copied to clipboard

Adding TRT-LLM + Triton truss

Open aspctu opened this issue 2 years ago • 0 comments

Overview

This PR adds support for Triton + TRT-LLM engines. We allow users to define a Huggingface repository for the pre-built engines and tokenizers. We leverage the C++ TRT runtime and the Triton Inference Server to provide high-performance model serving with streaming enabled.

aspctu avatar Oct 31 '23 06:10 aspctu