kube-state-metrics Ability to include `process_start_time

What would you like to be added:

Currently, process_start_time_seconds is included in the telemetry metrics, but not included in the standard metrics. I'd like to be able to opt into this metric being included in the regular metrics endpoint.

Why is this needed:

The prometheus-to-sd component expects the process_start_time_seconds metric to be present. Without it, its logs print the following warning every 60 seconds:

Metric process_start_time_seconds invalid or not defined for component kube-state-metrics. Using 1970-01-01 00:00:01 +0000 UTC instead. Cumulative metrics might be inaccurate.

Describe the solution you'd like

I'd expect this to be opt-in, probably via a CLI flag. Maybe --include-process-start-time or --include-process-start-time-seconds?

Additional context

I tried adding process_start_time_seconds to both --metric-allowlist and --metric-opt-in-list, and neither caused it to be included.

Dec 28 '21 21:12 WesCossick

I see that there is an example in the repository which uses KSM 1.4. Does this example work? Was the process_start_time_seconds metric removed at one point?

Jan 04 '22 10:01 fpetkovski

@fpetkovski I'm not quite sure I understand your question. Could you clarify?

To be clear, process_start_time_seconds is available, but it's only available in the telemetry metrics, which are exposed on a different port. So when prometheus-to-sd pulls the regular metrics from the standard port, process_start_time_seconds isn't included.

Jan 04 '22 14:01 WesCossick

Ah, looking at my comment, I haven't linked the actual example: https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml

This leads me to believe that either it is somehow possible to combine the two metric endpoints, or KSM 1.4 used to have this metric on the default port but it was moved to the telemetry one.

Jan 04 '22 14:01 fpetkovski

Oh, yes, that example is what we based our implementation on. It works, apart from the fact that it logs the "cumulative metrics might be inaccurate" warning (included in the issue description) every 60 seconds. But it's just a warning, not a fatal error, though it sounds like it should be addressed to ensure the metrics are accurate.

Jan 04 '22 14:01 WesCossick

I see, I believe they use to process_start_time_seconds to detect restarts for counters (aka cumulative metrics), something that is already built into Prometheus. IMO adding this metric makes sense, but I'd also like to get an opinion from @mrueg and/or @dgrisonnet to make sure there isn't something obvious that we're missing.

Jan 04 '22 15:01 fpetkovski

From a kube-state-metrics point of view, I don't think that makes sense since we want the distinction between metrics about kube-state-metrics and the ones about the Kubernetes APIs. Since they are two different kinds of information, I don't think it makes sense to merge them together. With being said I can see two solutions:

It seems possible to specify multiple endpoints in the configuration of prometheus-to-sd (https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L10-L12), so I am wondering if adding both endpoints will solve your problem. I think it should and if not, it might be something to fix in prometheus-to-sd
Add an option in kms to have all the metrics on only one server and disable it by default, but I am very reluctant to this because I don't think it really makes sense.

Jan 10 '22 19:01 dgrisonnet

It seems possible to specify multiple endpoints in the configuration of prometheus-to-sd (https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L10-L12)

I think this is just the container port for the KSM pod. The source seems to be defined bellow[1], and it seems that multiple sources can be set[2]. So that solution could potentially work.

[1] https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/f65ad56360bd0edce5b1bd063f9eb645715796cf/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L34 [2] https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/main.go#L97

Jan 11 '22 07:01 fpetkovski

Yeah, you are correct, that is definitely the pod spec, not sure why I thought it was the configuration of prometheus-to-sd.

Looking at the project a bit more, it seems that it checks the process_start_time_seconds metric per source so specifying two sources will not solve the issue: https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/main.go#L182-L187

With that said, in my opinion, the ideal solution would be to handle the case of one application having multiple metrics endpoints in prometheus-to-sd since it is a practice that is spread all over the ecosystem, but I can understand that it would require a big effort. So I would be fine to introduce a flag to expose metrics on only one server instead of two.

Jan 11 '22 09:01 dgrisonnet

While I can see how combining all the metrics for both kube-state-metrics and Kubernetes APIs into a single server would solve the issue, I'm concerned about how that might impact users' Stackdriver costs. The telemetry metrics endpoint produces a response about 12KiB in size (at least on one of our development Kubernetes clusters). Google charges for every byte of custom metrics ingested, and, as noted in kube-state-metrics's README, exporting more metrics than necessary can result in pretty high costs.

For that reason, it may be best if just the process_start_time_seconds metric were added to the Kubernetes APIs metrics endpoint (only if the optional flag was provided).

Thoughts?

Jan 11 '22 18:01 WesCossick

The telemetry endpoint should produce way fewer metrics than the other endpoint and ultimately you most likely also want to gather these metrics otherwise you'll be blind if anything goes wrong with kube-state-metrics.

as noted in kube-state-metrics's README, exporting more metrics than necessary can result in pretty high costs.

This comment refers to the endpoint that exposes metrics about the Kubernetes APIs. The telemetry endpoint only exposes a few metrics so cost-wise, it shouldn't have a huge impact. If you want to reduce the cost of kube-state-metrics' metric ingestion you should instead opt-out of some resources/metrics via the CLI flags. This will have a will bigger impact on cost-saving than opting-out of the telemetry metrics which are essential if anything goes wrong with kube-state-metrics.

For that reason, it may be best if just the process_start_time_seconds metric were added to the Kubernetes APIs metrics endpoint (only if the optional flag was provided).

My problem with adding just process_start_time_seconds is that then the CLI flag use-case will be specific to prometheus-to-sd whereas if we were to have a flag to merge both servers together, then it would be generic and benefit more users than just the prometheus-to-sd community.

Jan 13 '22 14:01 dgrisonnet

The telemetry endpoint only exposes a few metrics so cost-wise, it shouldn't have a huge impact.

That's what I initially thought, too, but Stackdriver is pricier than you might think. Assuming the telemetry endpoint produces a 12KiB response, and assuming the default export interval used by prometheus-to-sd of 1 minute, then that's ~16.875MiB of data exported per day. At a rate of $0.2580/MiB (Stackdriver's current pricing), that'll end up adding about $130 to someone's monthly bill. Probably negligible for most mid-to-large projects, but potentially impactful for smaller projects.

In fact, at our company, we actually have a couple of k8s clusters where we've trimmed the metrics exposed by kube-state-metrics down so heavily that, if all of the telemetry metrics were exported, they'd cost more than the regular metrics.

if we were to have a flag to merge both servers together, then it would be generic and benefit more users than just the prometheus-to-sd community.

That's completely understandable, though, so I can see why that might still be the better solution, even after considering the potential impact on cost. Just wanted to make sure that impact was considered, since it's bigger than you might expect and not necessarily something that can be written off in all cases.

There are some additional steps that users can take to reduce costs, too, like decreasing prometheus-to-sd's export frequency and/or inserting a container between the kube-state-metrics and prometheus-to-sd containers that applies extra metric filtering (we've employed both techniques in the clusters where we use kube-state-metrics).

Jan 13 '22 15:01 WesCossick

Are you sure that the price is per MB scraped instead of MB stored? Prometheus metrics use far less resources on disk (~1.6B per sample) compared to the text exposition format.

If the charging is done based on the metrics response size, it would mean that having shorter or longer metric descriptions would affect the pricing, which sounds a bit strange.

Jan 13 '22 15:01 fpetkovski

I'm not positive, but their documentation states that that price is for "data ingested." GKE metrics get the benefit of being priced on the samples ingested, but not custom metrics. It also goes on to say,

In Monitoring, ingestion refers to the process of writing time series to Monitoring. Each time series include some number of data points; those data points are the basis for ingestion charges.

Jan 13 '22 15:01 WesCossick

Also, from experience, a small change to the number of metrics exposed by kube-state-metrics can have a noticeable impact on Stackdriver's monthly cost.

Jan 13 '22 15:01 WesCossick

What if we add a flag to export the telemetry metrics on the same port, but also allow opting out of those metrics using the --metrics-denylist flag? That should satisfy both yours and @dgrisonnet's concerns.

Jan 13 '22 15:01 fpetkovski

Theoretically, that would be a potential solution but in practice, it is more complex to do that with telemetry metrics as we don't control all the collectors that we have. For instance, we couldn't use just that to disable all golang and process metrics with today's implementation.

Jan 13 '22 16:01 dgrisonnet

One way to do it is to decorate the gatherer and filter out the metric families it returns. Here's a POC which can be used as a starting point: https://github.com/kubernetes/kube-state-metrics/pull/1666/files

Jan 19 '22 14:01 fpetkovski

That sounds reasonable to me considering that there are only a few metrics families on the telemetry endpoint so the performance impact shouldn't be significant.

Jan 19 '22 18:01 dgrisonnet

@WesCossick if would you be interested in contributing this feature? You can use that PR as an example of how telemetry metrics can be filtered with existing flags.

Jan 20 '22 14:01 fpetkovski

I wouldn't necessarily have the bandwidth to work on that myself, at least not for the foreseeable future, but I'll see if one of our team's other engineers can tackle it. We don't use Go at our company, though, so the unfamiliarity there may pose an issue.

Jan 20 '22 19:01 WesCossick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 20 '22 19:04 k8s-triage-robot

/remove-lifecycle stale

Apr 27 '22 13:04 dgrisonnet

/assign

Jun 09 '22 08:06 vharsh

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 07 '22 08:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 07 '22 08:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Nov 06 '22 09:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 06 '22 09:11 k8s-ci-robot

kube-state-metrics
kube-state-metrics copied to clipboard

Ability to include `process_start_time_seconds` in metrics

kube-state-metrics kube-state-metrics copied to clipboard

Ability to include `process_start_time_seconds` in metrics

kube-state-metrics
kube-state-metrics copied to clipboard