Ryan McCormick

Results 158 comments of Ryan McCormick
trafficstars

CC @matthewkotila @nv-hwoo if you have any thoughts on the variance or improvements to the provided PA arguments

Hi @chriscarollo, have you used the `tritonserver --model-control-mode EXPLICIT ...` (or `POLL`) feature to dynamically load/unload models before? I believe there may be a known inconsistency where models loaded at...

Hi @chriscarollo, this is a known issue and has a proposed resolution in this PR: https://github.com/triton-inference-server/core/pull/321. Please chime in on the discussion with your use case, impact, etc.

Hi @geraldstanje, thanks for sharing! We have the [GitHub Discussions](https://github.com/triton-inference-server/server/discussions) open to support an official means of community discussion that the team can also engage with. Do you have any...

Hi @WilliamOnVoyage, I believe both the vLLM and TensorRT-LLM backends handle tokenization internally without user-code-changes required, and are configurable through their respective config files or based on the model being...

Hi @NguyenThanhHa288, can you share more details on how you're installing and what code you're writing that is generating this error? Please provide the minimal steps to reproduce this error.

Hi, this looks like use of OpenAI Triton, and not NVIDIA Triton Inference Server. Please raise an issue here: https://github.com/triton-lang/triton

Hi @xiazi-yu, I believe this is not possible to force which GPUs are selected when scheduling between models within an ensemble and multiple GPU choices (multiple model instances) are available....