model_server Don't know how to get the average latency from metric

Describe the bug Need to get the average latency for each rerank request, but currently ovms_request_time_us_sum always 0, Want to clarify which metric can I use, or how to calculate.

Firstly I considring ovms_request_time_us_sum/ovms_reauest_time_us_count, but found the time_us_sum always 0

The Non-zero metrics listed below, the largest on is ovms_graph_processing_time_us_sum, is it the total latency include both rerank and tokenizer? I'm confusing to cauculate the rerank average latency

ovms_inference_time_us_count{name="BAAI/bge-reranker-base_rerank_model",version="1"}   20
ovms_inference_time_us_sum{name="BAAI/bge-reranker-base_rerank_model",version="1"}   2794429
ovms_inference_time_us_count{name="BAAI/bge-reranker-base_tokenizer_model",version="1"}   40
ovms_inference_time_us_sum{name="BAAI/bge-reranker-base_tokenizer_model",version="1"}   72140
ovms_wait_for_infer_req_time_us_count{name="BAAI/bge-reranker-base_rerank_model",version="1"}   20
ovms_wait_for_infer_req_time_us_sum{name="BAAI/bge-reranker-base_rerank_model",version="1"}   32
ovms_wait_for_infer_req_time_us_count{name="BAAI/bge-reranker-base_tokenizer_model",version="1"}   40
ovms_wait_for_infer_req_time_us_sum{name="BAAI/bge-reranker-base_tokenizer_model",version="1"}   32
ovms_graph_processing_time_us_count{method="Unary",name="BAAI/bge-reranker-base"}   20
ovms_graph_processing_time_us_sum{method="Unary",name="BAAI/bge-reranker-base"}   2890231

To Reproduce Deploy ovms with BAAI/bge-reranker-base enable metrics with parameter "--metrics_enable" get metric with curl http://host_ip:port/metrics

Expected behavior A clear and concise description of what you expected to happen.

Logs Logs from OVMS, ideally with --log_level DEBUG. Logs from client.

Configuration

OVMS version
OVMS config.json file
CPU, accelerator's versions if applicable
Model repository directory structure
Model or publicly available similar model that reproduces the issue

Additional context Add any other context about the problem here.

May 26 '25 08:05 gavinlichn

Hello @gavinlichn

Please take a look at this documentation page describing metrics for graphs: https://docs.openvino.ai/2025/model-server/ovms_docs_metrics.html#metrics-implementation-for-mediapipe-graphs

I think you are looking for ovms_graph_processing_time_us:

Tracks duration of successfully started mediapipe graphs in us. It can represent pipeline processing time for unary calls or the session length for streamed requests.

Please let me know if you find this helpful

May 28 '25 12:05 dkalinowski

@gavinlichn did you resolve the issue?

Aug 12 '25 07:08 dkalinowski

Closing due to no activity - assuming resolved @gavinlichn

Sep 10 '25 09:09 dkalinowski