Ryan McCormick

Results 160 comments of Ryan McCormick

Hi @Will-Chou-5722, I think your observations look correct. The TensorRT Backend specifically is unique in that it uses one thread for multiple model instances on the same GPU, whereas most...

Hi @AshwinAmbal, can you reproduce this issue with the 24.06 release? I think this change from @oandreeva-nv may possibly help the issue you're observing: https://github.com/triton-inference-server/server/pull/7325.

Hi @AshwinAmbal, thanks for adding a ticket for tracking! CC @yinggeh @harryskim @statiraju for viz

Hi @SunnyGhj, thanks for filing an issue. Do you have any experiments or data showing this as a bottleneck? And have you tried modifying the code to see if it...

Hi @asamadiya, while we don't have an official example at this time, I see there are some open-source projects that aimed to do this. One such example is here, which...

Hi @JindrichD, thanks for sharing such a detailed issue. Can you try to reproduce this on the 24.07 release? There were recently some changes to how responses are written for...

Hi @mbahri, Do you have a minimal model, client, and steps you could share for reproducing to help expedite debugging? If it is a generic python backend shm issue, then...

Hi @LinGeLin, Can you please provide more details on the use case, as well as an example model+client to reproduce the current lack of support and show the bottlenecks?