serving
serving copied to clipboard
Feature request: Performance metrics per model-version
Feature Request
Describe the problem the feature is intended to solve
We have multiple AB running, where the same model_name can have different versions, which could have different performance outcomes.
For instance, the same model with the same inputs can have different number of layers or architecture, which can make it slower and heavier especially for "on CPU" processing.
We need to have a way to monitor and get perf. metrics such as model p95 and average latency, at model_name.version granularity, while currently, all that is visible is model_name level metrics.
:tensorflow:serving:request_latency_bucket{model_name="tf_model_name",API="Predict",entrypoint="GRPC",le="2.52873e+08"} 16237
Describe the solution
Solution to this is to have one more set of performance counters inside servable, to count p95 and average time at more granular level of model-version.
Describe alternatives you've considered
Only way we can see right now, is to execute a call from the client and measure latency this way, however that includes round trip latency and feature engineering requirements, that are specific to a given model-version, thus making it operationally challenging at scale and maintenance headache, while still not giving us pure server side metrics per model-version.
System information
- **OS Platform and Distribution: CentOS 7; later OEL 8
- TensorFlow Serving installed from (source or binary): source
- TensorFlow Serving version: 2.6
@godot73 Any opinion on this. Thanks!
@google is this not feasible or just nobody else asked before?
@vitalyli,
Similar feature request #1959 in progress. Requesting you to close this issue and follow similar thread for updates. Thank you.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.