stackdriver_exporter Error on ingesting out-of-order samples

After upgrading from 0.8.0 to 0.11.0 we started noticing a spam of warning on Prometheus servers logs.

This error appears once we use 0.9.1 and is still there up to version 0.11.0.

Prometheus log entry

{"caller":"scrape.go:1467","component":"scrape manager","level":"warn","msg":"Error on ingesting out-of-order samples","num_dropped":197,"scrape_pool":"kubernetes-service-endpoints","target":"http://10.34.95.168:9255/metrics","ts":"2021-06-10T16:59:22.545Z"}

Jun 10 '21 17:06 anas-aso

@SuperQ any idea what might be the issue ? I am running the exporter with the following config env variables.

      STACKDRIVER_EXPORTER_MONITORING_METRICS_INTERVAL:       5m
      STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET:         0s
      STACKDRIVER_EXPORTER_WEB_LISTEN_ADDRESS:                :9255
      STACKDRIVER_EXPORTER_WEB_TELEMETRY_PATH:                /metrics
      STACKDRIVER_EXPORTER_MAX_RETRIES:                       0
      STACKDRIVER_EXPORTER_HTTP_TIMEOUT:                      30s
      STACKDRIVER_EXPORTER_MAX_BACKOFF_DURATION:              5s
      STACKDRIVER_EXPORTER_BACKODFF_JITTER_BASE:              1s
      STACKDRIVER_EXPORTER_RETRY_STATUSES:                    503
      STACKDRIVER_EXPORTER_DROP_DELEGATED_PROJECTS:           false

Jun 29 '21 15:06 anas-aso

I managed to get rid of those errors by introduced an offset for the metrics pulled from Cloud Monitoring (Stackdriver). This issue seems to be caused by the ingestion delay : https://cloud.google.com/monitoring/api/metrics#metadata

Jul 01 '21 15:07 anas-aso

@anas-aso Did you set the offset to 240 seconds?

Aug 02 '21 14:08 weyert

@weyert 1m offset was enough to get rid of the errors. Also I didn't want to get close to Prometheus query.lookback-delta default value of 5m, otherwise I would have to touch many alerts queries.

Aug 02 '21 14:08 anas-aso

Thanks @anas-aso. I will give that a shot

Do you use a version of the exporter with your PR merged?

Aug 02 '21 15:08 weyert

Do you use a version of the exporter with your PR merged?

@weyert I only tried it on a staging environment before opening the PR. The problem with that change is that you will get metrics (within the same scrape) that are offset by different value (since the offset is introduced dynamically from GCP Monitoring API metadata). As a result, writing alerts require extra an extra step of checking the introduced offset of a metric before using it. That's why I haven't used my patch in prod yet, because I want to get the maintainers feedback first.

Aug 03 '21 07:08 anas-aso

Thank you, that's good point. I didn't think of that. Hopefully the maintainers will give you feedback :)

Aug 03 '21 10:08 weyert

setting STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET tp 30s fixed the issue for us

Jul 26 '22 12:07 pkrishnath

I stopped using Stackdriver Exporter in favor of https://github.com/GoogleCloudPlatform/prometheus-engine/tree/main/cmd/frontend.

Aug 08 '23 09:08 anas-aso

stackdriver_exporter stackdriver_exporter copied to clipboard

Error on ingesting out-of-order samples

stackdriver_exporter
stackdriver_exporter copied to clipboard