m3db-operator datapoint for aggregation too far in past

datapoint for aggregation too far in past

Open eberkut opened this issue 3 years ago • 0 comments

What version of the operator are you running?

m3db-operator-0.13.0

What version of Kubernetes are you running?

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:41:42Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

What are you trying to do?

I have deployed a m3db cluster with the operator. I have a default namespace for live data and an aggregated namespace for long-term storage. My spec is:

apiVersion: operator.m3db.io/v1alpha1
kind: M3DBCluster
metadata:
  name: m3db-cluster
spec:
  image: quay.io/m3db/m3dbnode:latest
  replicationFactor: 3
  numberOfShards: 128
  isolationGroups:
  - name: group1
    numInstances: 2
    nodeAffinityTerms:
    - key: alpha.eksctl.io/nodegroup-name
      values:
      - prod-opsmon-ng-1a
  - name: group2
    numInstances: 2
    nodeAffinityTerms:
    - key: alpha.eksctl.io/nodegroup-name
      values:
      - prod-opsmon-ng-1b
  - name: group3
    numInstances: 2
    nodeAffinityTerms:
    - key: alpha.eksctl.io/nodegroup-name
      values:
      - prod-opsmon-ng-1c
  namespaces:
  - name: default
    options:
      bootstrapEnabled: true
      flushEnabled: true
      writesToCommitLog: true
      cleanupEnabled: true
      snapshotEnabled: true
      repairEnabled: false
      retentionOptions:
        retentionPeriod: 2160h
        blockSize: 12h
        bufferFuture: 1h
        bufferPast: 2h
        blockDataExpiry: true
        blockDataExpiryAfterNotAccessPeriod: 10m
      indexOptions:
        enabled: true
        blockSize: 12h
      aggregationOptions:
        aggregations:
          - aggregated: false
  - name: longterm
    options:
      bootstrapEnabled: true
      flushEnabled: true
      writesToCommitLog: true
      cleanupEnabled: true
      snapshotEnabled: true
      repairEnabled: false
      retentionOptions:
        retentionPeriod: 9000h
        blockSize: 12h
        bufferFuture: 1h
        bufferPast: 2h
        blockDataExpiry: true
        blockDataExpiryAfterNotAccessPeriod: 30m
      indexOptions:
        enabled: true
        blockSize: 12h
      aggregationOptions:
        aggregations:
          - aggregated: true
            attributes:
              resolution: 10m
              downsampleOptions:
                all: true
  etcdEndpoints:
  - http://etcd.monitoring.svc.cluster.local:2379
  containerResources:
    requests:
      memory: 16Gi
      cpu: '4'
    limits:
      memory: 28Gi
      cpu: '8'
  dataDirVolumeClaimTemplate:
    metadata:
      name: m3db-data
    spec:
      accessModes:
      - ReadWriteOnce
      storageClassName: gp3
      resources:
        requests:
          storage: 10Ti

My namespaces are initializing correctly:

{
  "registry": {
    "namespaces": {
      "default": {
        "bootstrapEnabled": true,
        "flushEnabled": true,
        "writesToCommitLog": true,
        "cleanupEnabled": true,
        "repairEnabled": false,
        "retentionOptions": {
          "retentionPeriodNanos": "7776000000000000",
          "blockSizeNanos": "43200000000000",
          "bufferFutureNanos": "3600000000000",
          "bufferPastNanos": "7200000000000",
          "blockDataExpiry": true,
          "blockDataExpiryAfterNotAccessPeriodNanos": "600000000000",
          "futureRetentionPeriodNanos": "0"
        },
        "snapshotEnabled": true,
        "indexOptions": {
          "enabled": true,
          "blockSizeNanos": "43200000000000"
        },
        "schemaOptions": null,
        "coldWritesEnabled": false,
        "runtimeOptions": null,
        "cacheBlocksOnRetrieve": false,
        "aggregationOptions": {
          "aggregations": [
            {
              "aggregated": false,
              "attributes": null
            }
          ]
        },
        "stagingState": {
          "status": "READY"
        },
        "extendedOptions": null
      },
      "longterm": {
        "bootstrapEnabled": true,
        "flushEnabled": true,
        "writesToCommitLog": true,
        "cleanupEnabled": true,
        "repairEnabled": false,
        "retentionOptions": {
          "retentionPeriodNanos": "32400000000000000",
          "blockSizeNanos": "43200000000000",
          "bufferFutureNanos": "3600000000000",
          "bufferPastNanos": "7200000000000",
          "blockDataExpiry": true,
          "blockDataExpiryAfterNotAccessPeriodNanos": "1800000000000",
          "futureRetentionPeriodNanos": "0"
        },
        "snapshotEnabled": true,
        "indexOptions": {
          "enabled": true,
          "blockSizeNanos": "43200000000000"
        },
        "schemaOptions": null,
        "coldWritesEnabled": false,
        "runtimeOptions": null,
        "cacheBlocksOnRetrieve": false,
        "aggregationOptions": {
          "aggregations": [
            {
              "aggregated": true,
              "attributes": {
                "resolutionNanos": "600000000000",
                "downsampleOptions": {
                  "all": true
                }
              }
            }
          ]
        },
        "stagingState": {
          "status": "READY"
        },
        "extendedOptions": null
      }
    }
  }
}

I have configured a couple of test Prometheus environment to remote_write to M3DB.

remote_write:
  - url: https://REDCATED/api/v1/prom/remote/write
    remote_timeout: 30s
    queue_config:
      capacity: 10000
      max_samples_per_send: 3000
      batch_send_deadline: 10s
      min_shards: 4
      max_shards: 200
      min_backoff: 100ms
      max_backoff: 10s

What happened?

Source Prometheus logs are filled with remote_write errors:

ts=2021-12-28T17:22:17.510Z caller=dedupe.go:112 component=remote level=error remote_name=8ae741 url=https://REDACTED/api/v1/prom/remote/write msg="non-recoverable error" count=3000 exemplarCount=0 err="server returned HTTP status 400 Bad Request: {\"status\":\"error\",\"error\":\"bad_request_errors: count=58, last=datapoint for aggregation too far in past: off_by=10m17.435332333s, timestamp=2021-12-28T17:10:00Z, past_limit=2021-12-28T17:20:17Z, timestamp_unix_nanos=1640711400000000000, past_limit_unix_nanos=1640712017435332333\"}"

ts=2021-12-28T17:24:58.540Z caller=dedupe.go:112 component=remote level=error remote_name=8ae741 url=https://REDACTED/api/v1/prom/remote/write msg="non-recoverable error" count=3000 exemplarCount=0 err="server returned HTTP status 400 Bad Request: {\"status\":\"error\",\"error\":\"bad_request_errors: count=2, last=datapoint for aggregation too far in past: off_by=1m5.057778422s, timestamp=2021-12-28T17:21:53Z, past_limit=2021-12-28T17:22:58Z, timestamp_unix_nanos=1640712113450000000, past_limit_unix_nanos=1640712178507778422\"}"

And on the m3db nodes I get datapoint for aggregation too far in past errors, ranging from a few seconds to about 10 minutes.\

{"level":"error","ts":1640712238.5445695,"msg":"write error","rqID":"f13324a3-0126-45cd-b05b-d81b8ddc7a15","remoteAddr":"192.168.4.56:37888","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":1,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=5.067515455s, timestamp=2021-12-28T17:21:53Z, past_limit=2021-12-28T17:21:58Z, timestamp_unix_nanos=1640712113450000000, past_limit_unix_nanos=1640712118517515455"}
{"level":"error","ts":1640712257.5282857,"msg":"write error","rqID":"51c82440-a10a-464d-a701-fe4e29dfa196","remoteAddr":"192.168.92.76:26034","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":59,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=10m17.485669938s, timestamp=2021-12-28T17:12:00Z, past_limit=2021-12-28T17:22:17Z, timestamp_unix_nanos=1640711520000000000, past_limit_unix_nanos=1640712137485669938"}
{"level":"error","ts":1640712267.5369058,"msg":"write error","rqID":"7da4065e-50db-48e4-afcb-3d721cad7d93","remoteAddr":"192.168.55.139:51756","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":2,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=42.383129547s, timestamp=2021-12-28T17:21:45Z, past_limit=2021-12-28T17:22:27Z, timestamp_unix_nanos=1640712105132000000, past_limit_unix_nanos=1640712147515129547"}
{"level":"error","ts":1640712329.0677392,"msg":"write error","rqID":"1a31c55c-b894-4208-a279-4559e9f12ae9","remoteAddr":"192.168.92.76:26034","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":2,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=1m43.902580804s, timestamp=2021-12-28T17:21:45Z, past_limit=2021-12-28T17:23:29Z, timestamp_unix_nanos=1640712105132000000, past_limit_unix_nanos=1640712209034580804"}
{"level":"error","ts":1640712387.462137,"msg":"write error","rqID":"72c66e32-cfe1-4d65-b999-5050cd3c0f93","remoteAddr":"192.168.92.76:26032","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":2,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=2m42.280133072s, timestamp=2021-12-28T17:21:45Z, past_limit=2021-12-28T17:24:27Z, timestamp_unix_nanos=1640712105132000000, past_limit_unix_nanos=1640712267412133072"}
{"level":"error","ts":1640712387.5179935,"msg":"write error","rqID":"c3bf9e0b-b8c0-4871-b61f-bcfcdd45d6b0","remoteAddr":"192.168.92.76:26030","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":1,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=2m42.356698292s, timestamp=2021-12-28T17:21:45Z, past_limit=2021-12-28T17:24:27Z, timestamp_unix_nanos=1640712105132000000, past_limit_unix_nanos=1640712267488698292"}
{"level":"error","ts":1640712387.5640953,"msg":"write error","rqID":"e03314d0-3466-40fa-a968-435cc0094e36","remoteAddr":"192.168.55.139:51756","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":2,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=2m42.410037855s, timestamp=2021-12-28T17:21:45Z, past_limit=2021-12-28T17:24:27Z, timestamp_unix_nanos=1640712105132000000, past_limit_unix_nanos=1640712267542037855"}
{"level":"error","ts":1640712429.2815886,"msg":"write error","rqID":"3204ed52-ece5-45f9-a9ee-203dd75766e6","remoteAddr":"192.168.92.76:26032","httpResponseStatusCode":400,"numRegularErrors":0,"numBadRequestErrors":60,"lastRegularError":"","lastBadRequestErr":"datapoint for aggregation too far in past: off_by=10m9.226459202s, timestamp=2021-12-28T17:15:00Z, past_limit=2021-12-28T17:25:09Z, timestamp_unix_nanos=1640711700000000000, past_limit_unix_nanos=1640712309226459202"}

Checking the Prometheus remote_write metrics, it looks like no samples at all are successfully sent.

It doesn't seem to be a throughput issue (checking max shards is never reached and remote_write duration p99 is fairly low). There's no large backlog of samples either. I've also verified that there's no time synchronization issue on either the Prometheus sources or M3DB cluster.

prom-remote-1 prom-remote-2

I've seen references in Github and Slack to downsample bufferPastLimits (e.g. https://github.com/m3db/m3/issues/2355) but I don't see any way to customize this value with the operator.

Dec 28 '21 19:12 eberkut

m3db-operator m3db-operator copied to clipboard

datapoint for aggregation too far in past

m3db-operator
m3db-operator copied to clipboard