serving
serving copied to clipboard
Prometheus metrics are missing process_start_time_seconds
Feature Request
Add the process_start_time_seconds
metrics to the prometheus metrics.
Describe the problem the feature is intended to solve
In order to be able to utilise cumulative metrics the process_start_time_seconds
metric is required.
e.g.
- https://github.com/open-telemetry/opentelemetry-collector-contrib/commit/4359f4031f7a033e6545bbf11e27ad682545c8d0
- https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/8db3ffc2861997ad7ebebd70186f039c4b9a028a/prometheus-to-sd/translator/translator.go#L201
Describe the solution
Add the process_start_time_seconds
metrics to the prometheus metrics.
Describe alternatives you've considered
Writing a custom prometheus exporter which scrapes metrics and injects the process_start_time_seconds
metric.
This seems hacky and we can't actually guarantee the start time of the exporter is the same as the start time of tensorflow serving (e.g. a container crashes and is restarted).
@jsok,
Are you still looking for a resolution? We are planning on prioritising the issues based on the community interests. Please let us know if this issue still persists with the latest TF Serving 2.12.1 release so that we can work on fixing it. Thank you for your contributions.
I believe this would affect a large number of users consuming Prometheus metrics?
@yzy0004,
Can we have a look into this feature request to implement process_start_time_seconds metrics to the prometheus metrics. Thank you!
@netfs may know better about the feature development POC.