serving icon indicating copy to clipboard operation
serving copied to clipboard

Prometheus metrics are missing process_start_time_seconds

Open jsok opened this issue 3 years ago • 4 comments

Feature Request

Add the process_start_time_seconds metrics to the prometheus metrics.

Describe the problem the feature is intended to solve

In order to be able to utilise cumulative metrics the process_start_time_seconds metric is required.

e.g.

  • https://github.com/open-telemetry/opentelemetry-collector-contrib/commit/4359f4031f7a033e6545bbf11e27ad682545c8d0
  • https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/8db3ffc2861997ad7ebebd70186f039c4b9a028a/prometheus-to-sd/translator/translator.go#L201

Describe the solution

Add the process_start_time_seconds metrics to the prometheus metrics.

Describe alternatives you've considered

Writing a custom prometheus exporter which scrapes metrics and injects the process_start_time_seconds metric. This seems hacky and we can't actually guarantee the start time of the exporter is the same as the start time of tensorflow serving (e.g. a container crashes and is restarted).

jsok avatar Sep 03 '21 02:09 jsok

@jsok,

Are you still looking for a resolution? We are planning on prioritising the issues based on the community interests. Please let us know if this issue still persists with the latest TF Serving 2.12.1 release so that we can work on fixing it. Thank you for your contributions.

singhniraj08 avatar Jun 07 '23 06:06 singhniraj08

I believe this would affect a large number of users consuming Prometheus metrics?

jsok avatar Jun 07 '23 06:06 jsok

@yzy0004,

Can we have a look into this feature request to implement process_start_time_seconds metrics to the prometheus metrics. Thank you!

singhniraj08 avatar Jun 07 '23 06:06 singhniraj08

@netfs may know better about the feature development POC.

yzy0004 avatar Jun 09 '23 00:06 yzy0004