tensorrtllm_backend
tensorrtllm_backend copied to clipboard
TensorRT-LLM Backend with multiple LLMs (not to be confused with multi-model)
I followed this guide: https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#quick-start
I got this working but this only serves one model. Not only that, it breaks my understanding of the one-model equals one-folder structure of the Triton model repository. Inflight batching of a model consists of 4-5 model folders, then how would I add a another model which consists of another 4-5 model folders, and how would they be named?
Is inflight batching only an option (but every example for Trion + TRT-LLM backend is always inflight batching!) Is what I wanted to do simply be accomplished by copying the engine files (rank0.engine, rank1.engine etc.) into the model folders?
I've been stuck on this for a week now. Any hints/help/comments would be grateful!