Ryan McCormick

Results 159 comments of Ryan McCormick

Hi @rossbucky, [Multi-node inference](https://github.com/triton-inference-server/fastertransformer_backend#multi-node-inference) is specific to the FasterTransformer backend for now, so please ask any multi-node or fastertransformer specific questions there instead: https://github.com/triton-inference-server/fastertransformer_backend/issues Model parallelism (multiple copies of models...

Hi @issamemari , I understand the ask here is more generic, but in the meantime in case your ask is specific to only GPU device management, you could make use...

Ah I see, limiting per-model for multiple models on the same tritonserver process wouldn't work well with this approach. It would likely require more configuration and coordination on your end...

FYI don't merge this until the testing PR is also ready to merge and pipelines look good

Closing this in favor of moving forward with the original proposed changes: https://github.com/triton-inference-server/core/pull/321#issuecomment-2263784011

Hi @ClifHouck, thanks for your patience on this and sorry for the long turnaround time. Upon further reflection, I think application of the GPU labels to the inference request metrics...

CC @chriscarollo from [your issue](https://github.com/triton-inference-server/server/issues/7479) as this PR pertains to your question.

Hi @JonasGoebel, thanks for the contribution! Did you send a signed CLA per: https://github.com/triton-inference-server/server/blob/main/CONTRIBUTING.md#contributor-license-agreement-cla?

Hi @kebe7jun, thanks for submitting the PR. Can you elaborate on the specific models or features you're interested in that require this version upgrade? CC @oandreeva-nv @tanmayv25

>Started internal CI: **14897042** > ... > tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] load failed for model 'vllm_opt': version 1 is at UNAVAILABLE state: Internal: AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetP2PStatus' Looks like...