tensorrtllm_backend
tensorrtllm_backend copied to clipboard
Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side
System Info
- CPU Architecture x86_64
- GPU - A100-80GB
- CUDA version - 11
- Tensorrt LLM version : 0.9.0
- Triton server version - 2.46.0
- model : Llama3-7b
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
deploy a llama3-7b model on triton server 2.46.0
Expected behavior
Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side
actual behavior
Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx
additional notes
curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate'
--header 'Content-Type: application/json'
{"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count