tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

Open ajagetia2001 opened this issue 1 year ago • 2 comments

System Info

  • CPU Architecture x86_64
  • GPU - A100-80GB
  • CUDA version - 11
  • Tensorrt LLM version : 0.9.0
  • Triton server version - 2.46.0
  • model : Llama3-7b

Who can help?

No response

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

deploy a llama3-7b model on triton server 2.46.0

Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side

actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate'
--header 'Content-Type: application/json' {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count

ajagetia2001 avatar Aug 22 '24 10:08 ajagetia2001