stackdriver_exporter icon indicating copy to clipboard operation
stackdriver_exporter copied to clipboard

Error on ingesting out-of-order samples

Open anas-aso opened this issue 4 years ago • 8 comments

After upgrading from 0.8.0 to 0.11.0 we started noticing a spam of warning on Prometheus servers logs.

This error appears once we use 0.9.1 and is still there up to version 0.11.0.

Prometheus log entry

{"caller":"scrape.go:1467","component":"scrape manager","level":"warn","msg":"Error on ingesting out-of-order samples","num_dropped":197,"scrape_pool":"kubernetes-service-endpoints","target":"http://10.34.95.168:9255/metrics","ts":"2021-06-10T16:59:22.545Z"}

anas-aso avatar Jun 10 '21 17:06 anas-aso

@SuperQ any idea what might be the issue ? I am running the exporter with the following config env variables.

      STACKDRIVER_EXPORTER_MONITORING_METRICS_INTERVAL:       5m
      STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET:         0s
      STACKDRIVER_EXPORTER_WEB_LISTEN_ADDRESS:                :9255
      STACKDRIVER_EXPORTER_WEB_TELEMETRY_PATH:                /metrics
      STACKDRIVER_EXPORTER_MAX_RETRIES:                       0
      STACKDRIVER_EXPORTER_HTTP_TIMEOUT:                      30s
      STACKDRIVER_EXPORTER_MAX_BACKOFF_DURATION:              5s
      STACKDRIVER_EXPORTER_BACKODFF_JITTER_BASE:              1s
      STACKDRIVER_EXPORTER_RETRY_STATUSES:                    503
      STACKDRIVER_EXPORTER_DROP_DELEGATED_PROJECTS:           false

anas-aso avatar Jun 29 '21 15:06 anas-aso

I managed to get rid of those errors by introduced an offset for the metrics pulled from Cloud Monitoring (Stackdriver). This issue seems to be caused by the ingestion delay : https://cloud.google.com/monitoring/api/metrics#metadata

anas-aso avatar Jul 01 '21 15:07 anas-aso

@anas-aso Did you set the offset to 240 seconds?

weyert avatar Aug 02 '21 14:08 weyert

@weyert 1m offset was enough to get rid of the errors. Also I didn't want to get close to Prometheus query.lookback-delta default value of 5m, otherwise I would have to touch many alerts queries.

anas-aso avatar Aug 02 '21 14:08 anas-aso

Thanks @anas-aso. I will give that a shot

Do you use a version of the exporter with your PR merged?

weyert avatar Aug 02 '21 15:08 weyert

Do you use a version of the exporter with your PR merged?

@weyert I only tried it on a staging environment before opening the PR. The problem with that change is that you will get metrics (within the same scrape) that are offset by different value (since the offset is introduced dynamically from GCP Monitoring API metadata). As a result, writing alerts require extra an extra step of checking the introduced offset of a metric before using it. That's why I haven't used my patch in prod yet, because I want to get the maintainers feedback first.

anas-aso avatar Aug 03 '21 07:08 anas-aso

Thank you, that's good point. I didn't think of that. Hopefully the maintainers will give you feedback :)

weyert avatar Aug 03 '21 10:08 weyert

setting STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET tp 30s fixed the issue for us

pkrishnath avatar Jul 26 '22 12:07 pkrishnath

I stopped using Stackdriver Exporter in favor of https://github.com/GoogleCloudPlatform/prometheus-engine/tree/main/cmd/frontend.

anas-aso avatar Aug 08 '23 09:08 anas-aso