haoran.lin comments

Repositories
Issues
Comments

Results 1 comments of


                                            haoran.lin

How to serve multiple TensorRT-LLM models in the same process / server?

> Ideally yes. The TRT-LLM Triton backend does not check if there is an overlap, so it will let you deploy multiple models on a single GPU, but you'll need...