serving Feature request: Performance metrics per model-version

Feature Request

Describe the problem the feature is intended to solve

We have multiple AB running, where the same model_name can have different versions, which could have different performance outcomes.

For instance, the same model with the same inputs can have different number of layers or architecture, which can make it slower and heavier especially for "on CPU" processing.

We need to have a way to monitor and get perf. metrics such as model p95 and average latency, at model_name.version granularity, while currently, all that is visible is model_name level metrics.

:tensorflow:serving:request_latency_bucket{model_name="tf_model_name",API="Predict",entrypoint="GRPC",le="2.52873e+08"} 16237

Describe the solution

Solution to this is to have one more set of performance counters inside servable, to count p95 and average time at more granular level of model-version.

Describe alternatives you've considered

Only way we can see right now, is to execute a call from the client and measure latency this way, however that includes round trip latency and feature engineering requirements, that are specific to a given model-version, thus making it operationally challenging at scale and maintenance headache, while still not giving us pure server side metrics per model-version.

System information

**OS Platform and Distribution: CentOS 7; later OEL 8
TensorFlow Serving installed from (source or binary): source
TensorFlow Serving version: 2.6

Jan 27 '22 21:01 vitalyli

@godot73 Any opinion on this. Thanks!

Apr 16 '22 01:04 vitalyli

@google is this not feasible or just nobody else asked before?

May 30 '22 23:05 vitalyli

@vitalyli,

Similar feature request #1959 in progress. Requesting you to close this issue and follow similar thread for updates. Thank you.

Jun 23 '23 04:06 singhniraj08

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Jul 01 '23 02:07 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

Jul 09 '23 02:07 github-actions[bot]

serving serving copied to clipboard

Feature request: Performance metrics per model-version

Feature Request

Describe the problem the feature is intended to solve

Describe the solution

Describe alternatives you've considered

System information

serving
serving copied to clipboard