Ryan McCormick comments

Results 163 comments of


                                            Ryan McCormick

[Metrics] Triton server should expose software version / build number as a prometheus metric

Hi @Kellel, If you're just looking for the version of Triton that is currently running, is the server metadata endpoint sufficient? ```bash $ curl -s localhost:8000/v2 | jq { "name":...

Update vertex_ai_server.cc

Hi @yupbank, Thanks for the contribution! Have you filled out the CLA: https://github.com/triton-inference-server/server/blob/main/CONTRIBUTING.md#contributor-license-agreement-cla?

I am not able to specify the request rate along with the concurrency range

Hi @KhaledButainy, thanks for raising this issue. @matthewkotila could you comment on the valid combinations of parameters for PA here?

rbac required to run triton in k8s

Hi @okyspace, thanks for filing this. Just to clarify, this is a request for a recommended RBAC configuration when using Triton in Kubernetes to use as an example/reference to go...

[RFE] HandleGenerate equivalent for sagemaker_server.cc

Hi @billcai, thanks for raising this request! CC @nskool

Input data/shape validation

@GuanLuo @tanmayv25 @Tabrizian I believe there are some relaxed checks on the hot path for performance purposes. Do you have any comments on problem areas or risks?

Input data/shape validation

Hi @HennerM, thanks for raising this. Do you mind adding the following: 1. Sharing a quick pytorch script to generate the `model.pt` for your example 2. Updating your example client...

Unable to use triton client with shared memory in C++ (Jetpack 6 device)

Hi @ganeshmojow, Thanks for filing an issue. @nv-kmcgill53 could you help take a look here?

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed

Hi @aviv12825, I see the errors returned involve "connection refused". Have you confirmed from the pod logs that the server itself started up successfully to expose these endpoints?

Casting NumPy string array to np_utils.Tensor disproportionately increases latency

Hi @LLautenbacher, thanks for raising this issue with such detail. @Tabrizian @krishung5 may be able to chime in here. Is is possible this commented line is causing an extra copy?...