Ryan McCormick comments

Results 159 comments of


                                            Ryan McCormick

example for Kubernetes configuration multi-node multi-gpu

Hi @rossbucky, [Multi-node inference](https://github.com/triton-inference-server/fastertransformer_backend#multi-node-inference) is specific to the FasterTransformer backend for now, so please ask any multi-node or fastertransformer specific questions there instead: https://github.com/triton-inference-server/fastertransformer_backend/issues Model parallelism (multiple copies of models...

Multiple configuration files for the same model

Hi @issamemari , I understand the ask here is more generic, but in the meantime in case your ask is specific to only GPU device management, you could make use...

Multiple configuration files for the same model

Ah I see, limiting per-model for multiple models on the same tritonserver process wouldn't work well with this approach. It would likely require more configuration and coordination on your end...

Support top level response caching for ensemble models

FYI don't merge this until the testing PR is also ready to merge and pipelines look good

Comment out gpu metric gathering code that cannot succeed

Closing this in favor of moving forward with the original proposed changes: https://github.com/triton-inference-server/core/pull/321#issuecomment-2263784011

Rearrange metric enablement, so that model metric reporter can procee…

Hi @ClifHouck, thanks for your patience on this and sorry for the long turnaround time. Upon further reflection, I think application of the GPU labels to the inference request metrics...

Rearrange metric enablement, so that model metric reporter can procee…

CC @chriscarollo from [your issue](https://github.com/triton-inference-server/server/issues/7479) as this PR pertains to your question.

Raise MLFlow error when env TRITON_MODEL_REPO not set

Hi @JonasGoebel, thanks for the contribution! Did you send a signed CLA per: https://github.com/triton-inference-server/server/blob/main/CONTRIBUTING.md#contributor-license-agreement-cla?

Bump vllm to v0.4.2

Hi @kebe7jun, thanks for submitting the PR. Can you elaborate on the specific models or features you're interested in that require this version upgrade? CC @oandreeva-nv @tanmayv25

Bump vllm to v0.4.2

>Started internal CI: **14897042** > ... > tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] load failed for model 'vllm_opt': version 1 is at UNAVAILABLE state: Internal: AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetP2PStatus' Looks like...