Iman Tabrizian
Results
132
comments of
Iman Tabrizian
@AWallyAllah By any chance are you using `PyTorch` in your Python model? Could you share the code for model A?
Since TRT-LLM backend has its own batching and queueing logic and immediately puts the request in its own queues priority will most likely have no effect there. I'll transfer this...