Ryan McCormick comments

Results 158 comments of


                                            Ryan McCormick

trafficstars

A fluctuating result is obtained when perf_analyze is run for a pressure test

CC @matthewkotila @nv-hwoo if you have any thoughts on the variance or improvements to the provided PA arguments

nv_inference_count no longer includes gpu_uuid?

Hi @chriscarollo, have you used the `tritonserver --model-control-mode EXPLICIT ...` (or `POLL`) feature to dynamically load/unload models before? I believe there may be a known inconsistency where models loaded at...

nv_inference_count no longer includes gpu_uuid?

Hi @chriscarollo, this is a known issue and has a proposed resolution in this PR: https://github.com/triton-inference-server/core/pull/321. Please chime in on the discussion with your use case, impact, etc.

[New] Discord channel for triton-inference-server, tensorrt

Hi @geraldstanje, thanks for sharing! We have the [GitHub Discussions](https://github.com/triton-inference-server/server/discussions) open to support an official means of community discussion that the team can also engage with. Do you have any...

Provide native support for server-side tokenization

Hi @WilliamOnVoyage, I believe both the vLLM and TensorRT-LLM backends handle tokenization internally without user-code-changes required, and are configurable through their respective config files or based on the model being...

module 'triton' has no attribute 'language'

Hi @NguyenThanhHa288, can you share more details on how you're installing and what code you're writing that is generating this error? Please provide the minimal steps to reproduce this error.

module 'triton' has no attribute 'language'

Hi, this looks like use of OpenAI Triton, and not NVIDIA Triton Inference Server. Please raise an issue here: https://github.com/triton-lang/triton

ensemble multi-GPU

Hi @xiazi-yu, I believe this is not possible to force which GPUs are selected when scheduling between models within an ensemble and multiple GPU choices (multiple model instances) are available....