kube-state-metrics
kube-state-metrics copied to clipboard
Ability to include `process_start_time_seconds` in metrics
What would you like to be added:
Currently, process_start_time_seconds
is included in the telemetry metrics, but not included in the standard metrics. I'd like to be able to opt into this metric being included in the regular metrics endpoint.
Why is this needed:
The prometheus-to-sd component expects the process_start_time_seconds
metric to be present. Without it, its logs print the following warning every 60 seconds:
Metric process_start_time_seconds invalid or not defined for component kube-state-metrics. Using 1970-01-01 00:00:01 +0000 UTC instead. Cumulative metrics might be inaccurate.
Describe the solution you'd like
I'd expect this to be opt-in, probably via a CLI flag. Maybe --include-process-start-time
or --include-process-start-time-seconds
?
Additional context
I tried adding process_start_time_seconds
to both --metric-allowlist
and --metric-opt-in-list
, and neither caused it to be included.
I see that there is an example in the repository which uses KSM 1.4. Does this example work? Was the process_start_time_seconds
metric removed at one point?
@fpetkovski I'm not quite sure I understand your question. Could you clarify?
To be clear, process_start_time_seconds
is available, but it's only available in the telemetry metrics, which are exposed on a different port. So when prometheus-to-sd
pulls the regular metrics from the standard port, process_start_time_seconds
isn't included.
Ah, looking at my comment, I haven't linked the actual example: https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml
This leads me to believe that either it is somehow possible to combine the two metric endpoints, or KSM 1.4 used to have this metric on the default port but it was moved to the telemetry one.
Oh, yes, that example is what we based our implementation on. It works, apart from the fact that it logs the "cumulative metrics might be inaccurate" warning (included in the issue description) every 60 seconds. But it's just a warning, not a fatal error, though it sounds like it should be addressed to ensure the metrics are accurate.
I see, I believe they use to process_start_time_seconds to detect restarts for counters (aka cumulative metrics), something that is already built into Prometheus. IMO adding this metric makes sense, but I'd also like to get an opinion from @mrueg and/or @dgrisonnet to make sure there isn't something obvious that we're missing.
From a kube-state-metrics point of view, I don't think that makes sense since we want the distinction between metrics about kube-state-metrics and the ones about the Kubernetes APIs. Since they are two different kinds of information, I don't think it makes sense to merge them together. With being said I can see two solutions:
- It seems possible to specify multiple endpoints in the configuration of prometheus-to-sd (https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L10-L12), so I am wondering if adding both endpoints will solve your problem. I think it should and if not, it might be something to fix in prometheus-to-sd
- Add an option in kms to have all the metrics on only one server and disable it by default, but I am very reluctant to this because I don't think it really makes sense.
- It seems possible to specify multiple endpoints in the configuration of prometheus-to-sd (https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L10-L12)
I think this is just the container port for the KSM pod. The source seems to be defined bellow[1], and it seems that multiple sources can be set[2]. So that solution could potentially work.
[1] https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/f65ad56360bd0edce5b1bd063f9eb645715796cf/prometheus-to-sd/kubernetes/prometheus-to-sd-kube-state-metrics.yaml#L34 [2] https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/main.go#L97
Yeah, you are correct, that is definitely the pod spec, not sure why I thought it was the configuration of prometheus-to-sd.
Looking at the project a bit more, it seems that it checks the process_start_time_seconds
metric per source so specifying two sources will not solve the issue: https://github.com/GoogleCloudPlatform/k8s-stackdriver/blob/master/prometheus-to-sd/main.go#L182-L187
With that said, in my opinion, the ideal solution would be to handle the case of one application having multiple metrics endpoints in prometheus-to-sd since it is a practice that is spread all over the ecosystem, but I can understand that it would require a big effort. So I would be fine to introduce a flag to expose metrics on only one server instead of two.
While I can see how combining all the metrics for both kube-state-metrics and Kubernetes APIs into a single server would solve the issue, I'm concerned about how that might impact users' Stackdriver costs. The telemetry metrics endpoint produces a response about 12KiB in size (at least on one of our development Kubernetes clusters). Google charges for every byte of custom metrics ingested, and, as noted in kube-state-metrics's README, exporting more metrics than necessary can result in pretty high costs.
For that reason, it may be best if just the process_start_time_seconds
metric were added to the Kubernetes APIs metrics endpoint (only if the optional flag was provided).
Thoughts?
The telemetry endpoint should produce way fewer metrics than the other endpoint and ultimately you most likely also want to gather these metrics otherwise you'll be blind if anything goes wrong with kube-state-metrics.
as noted in kube-state-metrics's README, exporting more metrics than necessary can result in pretty high costs.
This comment refers to the endpoint that exposes metrics about the Kubernetes APIs. The telemetry endpoint only exposes a few metrics so cost-wise, it shouldn't have a huge impact. If you want to reduce the cost of kube-state-metrics' metric ingestion you should instead opt-out of some resources/metrics via the CLI flags. This will have a will bigger impact on cost-saving than opting-out of the telemetry metrics which are essential if anything goes wrong with kube-state-metrics.
For that reason, it may be best if just the process_start_time_seconds metric were added to the Kubernetes APIs metrics endpoint (only if the optional flag was provided).
My problem with adding just process_start_time_seconds
is that then the CLI flag use-case will be specific to prometheus-to-sd whereas if we were to have a flag to merge both servers together, then it would be generic and benefit more users than just the prometheus-to-sd community.
The telemetry endpoint only exposes a few metrics so cost-wise, it shouldn't have a huge impact.
That's what I initially thought, too, but Stackdriver is pricier than you might think. Assuming the telemetry endpoint produces a 12KiB response, and assuming the default export interval used by prometheus-to-sd of 1 minute, then that's ~16.875MiB of data exported per day. At a rate of $0.2580/MiB (Stackdriver's current pricing), that'll end up adding about $130 to someone's monthly bill. Probably negligible for most mid-to-large projects, but potentially impactful for smaller projects.
In fact, at our company, we actually have a couple of k8s clusters where we've trimmed the metrics exposed by kube-state-metrics down so heavily that, if all of the telemetry metrics were exported, they'd cost more than the regular metrics.
if we were to have a flag to merge both servers together, then it would be generic and benefit more users than just the prometheus-to-sd community.
That's completely understandable, though, so I can see why that might still be the better solution, even after considering the potential impact on cost. Just wanted to make sure that impact was considered, since it's bigger than you might expect and not necessarily something that can be written off in all cases.
There are some additional steps that users can take to reduce costs, too, like decreasing prometheus-to-sd's export frequency and/or inserting a container between the kube-state-metrics and prometheus-to-sd containers that applies extra metric filtering (we've employed both techniques in the clusters where we use kube-state-metrics).
Are you sure that the price is per MB scraped instead of MB stored? Prometheus metrics use far less resources on disk (~1.6B per sample) compared to the text exposition format.
If the charging is done based on the metrics response size, it would mean that having shorter or longer metric descriptions would affect the pricing, which sounds a bit strange.
I'm not positive, but their documentation states that that price is for "data ingested." GKE metrics get the benefit of being priced on the samples ingested, but not custom metrics. It also goes on to say,
In Monitoring, ingestion refers to the process of writing time series to Monitoring. Each time series include some number of data points; those data points are the basis for ingestion charges.
Also, from experience, a small change to the number of metrics exposed by kube-state-metrics can have a noticeable impact on Stackdriver's monthly cost.
What if we add a flag to export the telemetry metrics on the same port, but also allow opting out of those metrics using the --metrics-denylist
flag? That should satisfy both yours and @dgrisonnet's concerns.
Theoretically, that would be a potential solution but in practice, it is more complex to do that with telemetry metrics as we don't control all the collectors that we have. For instance, we couldn't use just that to disable all golang and process metrics with today's implementation.
One way to do it is to decorate the gatherer and filter out the metric families it returns. Here's a POC which can be used as a starting point: https://github.com/kubernetes/kube-state-metrics/pull/1666/files
That sounds reasonable to me considering that there are only a few metrics families on the telemetry endpoint so the performance impact shouldn't be significant.
@WesCossick if would you be interested in contributing this feature? You can use that PR as an example of how telemetry metrics can be filtered with existing flags.
I wouldn't necessarily have the bandwidth to work on that myself, at least not for the foreseeable future, but I'll see if one of our team's other engineers can tackle it. We don't use Go at our company, though, so the unfamiliarity there may pose an issue.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.