Iman Tabrizian comments

Repositories
Issues
Comments

Results 132 comments of


                                            Iman Tabrizian

Inference in Triton ensemble model is much slower than single model in Triton

@AWallyAllah By any chance are you using `PyTorch` in your Python model? Could you share the code for model A?

dynamic batching not working properly with tensorrtllm_backend

Since TRT-LLM backend has its own batching and queueing logic and immediately puts the request in its own queues priority will most likely have no effect there. I'll transfer this...