tensorrtllm_backend Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

Open ajagetia2001 opened this issue 1 year ago • 2 comments

System Info

CPU Architecture x86_64
GPU - A100-80GB
CUDA version - 11
Tensorrt LLM version : 0.9.0
Triton server version - 2.46.0
model : Llama3-7b

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

deploy a llama3-7b model on triton server 2.46.0

Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side

actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate'
--header 'Content-Type: application/json' {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count

Aug 22 '24 10:08 ajagetia2001

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

tensorrtllm_backend
tensorrtllm_backend copied to clipboard