m3db-operator icon indicating copy to clipboard operation
m3db-operator copied to clipboard

"unaggregated namespace is not yet initialized" error

Open skupjoe opened this issue 3 years ago • 3 comments

I am following the instructions here and after deploying the cluster and attempting to write some data (manually, explore in Grafana, anything) I get the following error:

{
    "status": "error",
    "error": "unaggregated namespace is not yet initialized"
}

Here is my etcd cluster config:

apiVersion: v1
kind: Service
metadata:
  name: etcd
  labels:
    app: etcd
spec:
  ports:
  - port: 2379
    name: client
  - port: 2380
    name: peer
  clusterIP: None
  selector:
    app: etcd
---
apiVersion: v1
kind: Service
metadata:
  name: etcd-cluster
  labels:
    app: etcd
spec:
  selector:
    app: etcd
  ports:
  - port: 2379
    protocol: TCP
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: etcd
  labels:
    app: etcd
spec:
  serviceName: "etcd"
  replicas: 3
  selector:
    matchLabels:
      app: etcd
  template:
    metadata:
      labels:
        app: etcd
    spec:
      containers:
      - name: etcd
        image: quay.io/coreos/etcd:v3.5.0
        imagePullPolicy: IfNotPresent
        command:
          - "etcd"
          - "--name"
          - "$(MY_POD_NAME)"
          - "--listen-peer-urls"
          - "http://$(MY_IP):2380"
          - "--listen-client-urls"
          - "http://$(MY_IP):2379,http://127.0.0.1:2379"
          - "--advertise-client-urls"
          - "http://$(MY_POD_NAME).etcd:2379"
          - "--initial-cluster-token"
          - "etcd-cluster-1"
          - "--initial-advertise-peer-urls"
          - "http://$(MY_POD_NAME).etcd:2380"
          - "--initial-cluster"
          - "etcd-0=http://etcd-0.etcd:2380,etcd-1=http://etcd-1.etcd:2380,etcd-2=http://etcd-2.etcd:2380"
          - "--initial-cluster-state"
          - "new"
          - "--data-dir"
          - "/var/lib/etcd"
        ports:
        - containerPort: 2379
          name: client
        - containerPort: 2380
          name: peer
        volumeMounts:
        - name: etcd-data
          mountPath: /var/lib/etcd
        env:
        - name: MY_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ETCDCTL_API
          value: "3"
  volumeClaimTemplates:
  - metadata:
      name: etcd-data
    spec:
      storageClassName: encrypted-gp2
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 30Gi
        limits:
          storage: 30Gi

Here is my M3DBCluster (operator) config:

apiVersion: operator.m3db.io/v1alpha1
kind: M3DBCluster
metadata:
  name: m3db-cluster
spec:
  image: quay.io/m3db/m3dbnode:v1.2.0
  imagePullPolicy: IfNotPresent
  replicationFactor: 3
  numberOfShards: 256
  etcdEndpoints:
    - http://etcd-0.etcd:2379
    - http://etcd-1.etcd:2379
    - http://etcd-2.etcd:2379
  isolationGroups:
    - name: group1
      numInstances: 1
      nodeAffinityTerms:
        - key: failure-domain.beta.kubernetes.io/zone
          values:
            - us-west-2a
    - name: group2
      numInstances: 1
      nodeAffinityTerms:
        - key: failure-domain.beta.kubernetes.io/zone
          values:
            - us-west-2b
    - name: group3
      numInstances: 1
      nodeAffinityTerms:
        - key: failure-domain.beta.kubernetes.io/zone
          values:
            - us-west-2c
  podIdentityConfig:
    sources: []
  namespaces:
    - name: metrics-10s:2d
      preset: 10s:2d
  dataDirVolumeClaimTemplate:
    metadata:
      name: m3db-data
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: encrypted-gp2
      resources:
        requests:
          storage: 350Gi
        limits:
          storage: 350Gi

And then the operations I apply to deploy my cluster:

helm install -n monitoring m3db-operator m3db/m3db-operator
kubectl -n monitoring apply -f ~/Dropbox/projects/m3db/conf/m3db-cluster.yaml

skupjoe avatar Sep 22 '21 23:09 skupjoe

My resultant auto-generated m3coordinator config from this is the following:

kind: ConfigMap
apiVersion: v1
metadata:
  name: m3db-config-map-m3db-cluster
  namespace: monitoring
  uid: 8beea769-2b68-488a-a3c3-606622013bcd
  resourceVersion: '163096500'
  creationTimestamp: '2021-09-22T23:45:20Z'
  ownerReferences:
    - apiVersion: operator.m3db.io/v1alpha1
      kind: m3dbcluster
      name: m3db-cluster
      uid: aa05a31c-4767-4674-8343-129e609a793d
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: m3db-operator
      operation: Update
      apiVersion: v1
      time: '2021-09-22T23:45:20Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:m3.yml': {}
        'f:metadata':
          'f:ownerReferences':
            .: {}
            'k:{"uid":"aa05a31c-4767-4674-8343-129e609a793d"}':
              .: {}
              'f:apiVersion': {}
              'f:blockOwnerDeletion': {}
              'f:controller': {}
              'f:kind': {}
              'f:name': {}
              'f:uid': {}
data:
  m3.yml: |

    coordinator: {}

    db:
      hostID:
        resolver: file
        file:
          path: /etc/m3db/pod-identity/identity
          timeout: 5m

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority

      discovery:
        config:
          service:
            env: "monitoring/m3db-cluster"
            zone: embedded
            service: m3db
            cacheDir: /var/lib/m3kv
            etcdClusters:
            - zone: embedded
              endpoints:
              - "http://etcd-0.etcd:2379"
              - "http://etcd-1.etcd:2379"
              - "http://etcd-2.etcd:2379"

My resultant namespace config from this is the following:

http://localhost:7201/api/v1/services/m3db/namespace

{
    "registry": {
        "namespaces": {
            "metrics-10s:2d": {
                "bootstrapEnabled": true,
                "flushEnabled": true,
                "writesToCommitLog": true,
                "cleanupEnabled": true,
                "repairEnabled": false,
                "retentionOptions": {
                    "retentionPeriodNanos": "172800000000000",
                    "blockSizeNanos": "7200000000000",
                    "bufferFutureNanos": "600000000000",
                    "bufferPastNanos": "600000000000",
                    "blockDataExpiry": true,
                    "blockDataExpiryAfterNotAccessPeriodNanos": "300000000000",
                    "futureRetentionPeriodNanos": "0"
                },
                "snapshotEnabled": true,
                "indexOptions": {
                    "enabled": true,
                    "blockSizeNanos": "7200000000000"
                },
                "schemaOptions": null,
                "coldWritesEnabled": false,
                "runtimeOptions": null,
                "cacheBlocksOnRetrieve": false,
                "aggregationOptions": {
                    "aggregations": [
                        {
                            "aggregated": true,
                            "attributes": {
                                "resolutionNanos": "10000000000",
                                "downsampleOptions": {
                                    "all": true
                                }
                            }
                        }
                    ]
                },
                "stagingState": {
                    "status": "READY"
                },
                "extendedOptions": null
            }
        }
    }
}

(Note how the stagingState status is READY, but this namespace is being reported as not yet initialized.)

skupjoe avatar Sep 22 '21 23:09 skupjoe

I believe that this issue relates to this change.

To fix the issue:

  1. I provide my own m3coordinator conf by adding this line to my M3DBCluster (operator) config:
configMapName: m3db-cluster-config-map
  1. Then I provide my own m3coordinator config which is a duplicate the auto-generated config, but instead has config to define the unaggregated namespace:
kind: ConfigMap
apiVersion: v1
metadata:
  name: m3db-cluster-config-map
data:
  m3.yml: |
    coordinator:
      local:
        namespaces:
          - namespace: metrics-10s:2d
            type: unaggregated
            retention: 48h

    db:
      hostID:
        resolver: file
        file:
          path: /etc/m3db/pod-identity/identity
          timeout: 5m

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority

      discovery:
        config:
          service:
            env: "monitoring/m3db-cluster"
            zone: embedded
            service: m3db
            cacheDir: /var/lib/m3kv
            etcdClusters:
            - zone: embedded
              endpoints:
              - "http://etcd-0.etcd:2379"
              - "http://etcd-1.etcd:2379"
              - "http://etcd-2.etcd:2379"
  1. Then I apply/deploy these manifests by performing the following:
helm install -n monitoring m3db-operator m3db/m3db-operator
kubectl -n monitoring apply -f ~/Dropbox/projects/m3db/conf/m3db-cluster-config-map.yaml
kubectl -n monitoring apply -f ~/Dropbox/projects/m3db/conf/m3db-cluster.yaml
  1. The resultant namespace config from this is the following:
{
  "registry": {
      "namespaces": {
          "metrics-10s:2d": {
              "bootstrapEnabled": true,
              "flushEnabled": true,
              "writesToCommitLog": true,
              "cleanupEnabled": true,
              "repairEnabled": false,
              "retentionOptions": {
                  "retentionPeriodNanos": "172800000000000",
                  "blockSizeNanos": "7200000000000",
                  "bufferFutureNanos": "600000000000",
                  "bufferPastNanos": "600000000000",
                  "blockDataExpiry": true,
                  "blockDataExpiryAfterNotAccessPeriodNanos": "300000000000",
                  "futureRetentionPeriodNanos": "0"
              },
              "snapshotEnabled": true,
              "indexOptions": {
                  "enabled": true,
                  "blockSizeNanos": "7200000000000"
              },
              "schemaOptions": null,
              "coldWritesEnabled": false,
              "runtimeOptions": null,
              "cacheBlocksOnRetrieve": false,
              "aggregationOptions": {
                  "aggregations": [
                      {
                          "aggregated": true,
                          "attributes": {
                              "resolutionNanos": "10000000000",
                              "downsampleOptions": {
                                  "all": true
                              }
                          }
                      }
                  ]
              },
              "stagingState": {
                  "status": "UNKNOWN"
              },
              "extendedOptions": null
          }
      }
  }
}

(Note how the stagingState status is UNKNOWN. This is the same issue as m3 issue #3649

  1. Then I need to force-ready this namespace, as per m3 issue #3649
http://localhost:7201/api/v1/services/m3db/namespace/ready

{
    "name": "metrics-10s:2d",
    "force": true
}

And then I restart/delete the m3db-operator pod and namespace becomes stagingState status is READY and I can write/query for data.

skupjoe avatar Sep 23 '21 00:09 skupjoe

@skupjoe where did you get the base coordinator config? We can look into updating that. Also pull requests are welcome if you'd like to contribute a change!

wesleyk avatar Nov 17 '21 21:11 wesleyk