noobaa-operator icon indicating copy to clipboard operation
noobaa-operator copied to clipboard

Feature request: Make noobaa-db highly available

Open lallinger-arbeit opened this issue 3 years ago • 9 comments

Is your feature request related to a problem? Please describe. Currently there is only one noobaa-db pod. If this pod dies, for whatever reason, the noobaa instance is no longer responsive until the pod restarts, which may take quite some time depending on your storageclass.

Describe the solution you'd like There should be a option in the NooBaa CR to automatically create a HA setup of the noobaa-db statefulset. For the currently used MongoDB this can easily be done using a MongoDB replicaset (see below for example). Also when this gets merged the MongoDB URL can easily be changed.

Describe alternatives you've considered A other option would be that the user provides a HA instance of MongoDB this would also require above PR to be merged. But when it comes to cluster external MongoDB instances the tight timeouts set in noobaa-core could raise a problem (@guymguym, quintin said you told him this could be problematic). Also it would be way more userfriendly if the user could easily upgrade to a real HA setup of noobaa (AFAIK the core does not need to be HA as it only handles the UI).

Additional context Currently we are testing a HA setup using mongo:3.6.21 in cluster. The k8s service for this is the same as with the normal non-HA noobaa-db. I changed the statefulset to this:

kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: noobaa-db
  annotations:
    argocd.argoproj.io/sync-wave: "-2"
  labels:
    app: noobaa
spec:
  replicas: 3
  selector:
    matchLabels:
      noobaa-db: noobaa
  template:
    metadata:
      labels:
        app: noobaa
        noobaa-db: noobaa
    spec:
      serviceAccountName: noobaa
      containers:
        - name: db
          image: 'mongo:3.6.21'
          command:
            - bash
            - '-c'
            - >-
              mkdir -p /data/mongo/cluster/shard1 &&
              mongod --port 27017 --bind_ip_all --dbpath
              /data/mongo/cluster/shard1 --replSet rs0
          volumeMounts:
            - name: db
              mountPath: /data
      serviceAccount: noobaa
      dnsPolicy: ClusterFirst
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: db
        labels:
          app: noobaa
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: block
        volumeMode: Filesystem
  serviceName: noobaa-db

To activate the replicaset on all db pods i created a simple job:

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
  name: mongo-init
spec:
  backoffLimit: 10
  template:
    spec:
      volumes:
        - name: cache-volume
          emptyDir: { }
      containers:
        - name: mongo-init
          image: 'mongo:3.6.21'
          command:
            - bash
            - '-c'
            - >-
              ( echo 'rs.initiate({_id: "rs0",version: 1,members:
              [{ _id: 0, host : "noobaa-db-0.noobaa-db" },
              { _id: 1, host : "noobaa-db-1.noobaa-db" },
              { _id: 2, host : "noobaa-db-2.noobaa-db" }]});'
              > /data/tmp/init.js ) &&
              mongo mongodb://noobaa-db-0.noobaa-db /data/tmp/init.js &&
              mongo mongodb://noobaa-db-1.noobaa-db /data/tmp/init.js &&
              mongo mongodb://noobaa-db-2.noobaa-db /data/tmp/init.js
          volumeMounts:
            - mountPath: /data/tmp
              name: cache-volume
      restartPolicy: OnFailure

The initialization the job does could also be done in the operator using an exec command to every noobaa-db pod.

Note: The ArgoCD sync waves could be handled by the operator, so first ensure the db is up and running before deploying the core.

lallinger-arbeit avatar Dec 08 '20 11:12 lallinger-arbeit

@dannyzaken Can you take a look and see if this can be used for our HA requirements?

guymguym avatar Dec 10 '20 02:12 guymguym

Hey folks, is there any progress on this task? We've recently run into precisely this issue on one of our baremetal clusters when a node became unresponsive and the db pod was stuck in a terminating state because of that. It made the whole Noobaa unresponsive. If the db and whole Noobaa was HA, our single node failure wouldn't affect Openshift Container Storage accessibility via Noobaa...

tumido avatar Apr 19 '21 13:04 tumido

@liranmauda as you are thinking about removing MongoDB in favor of PostgreSQL this may be changed to HA for noobaa-db-pg. This i think is quite important as anybody using noobaa in production would want a HA setup. Alternatively it would be a good start to resolve #543 so anybody can easily provide his own postgres instance.

lallinger-arbeit avatar Nov 26 '21 14:11 lallinger-arbeit

It would be a nice feature to provide a custom posgresql service (like a percona one) by providing custom posgtresql secrets and service name and switch off postgresql statefulset.

depouill avatar Apr 29 '22 09:04 depouill

Thanks @depouill for the feedback!

@dannyzaken any reason why we can't add an optional PG url in the spec like we added for mongo before?

https://github.com/noobaa/noobaa-operator/blob/b3d79c3eeca36bf45f89e79b9587b8f4d4a9c043/pkg/apis/noobaa/v1alpha1/noobaa_types.go#L117-L119

guymguym avatar Apr 29 '22 09:04 guymguym

@guymguym, there is no reason we can't add it.
in addition to the URL, we will also need to get a secret with the credentials to connect to postgres. @depouill do you have an example to a secret provided by percona operator? we should probably look at other postgres operators as well

dannyzaken avatar May 01 '22 16:05 dannyzaken

@guymguym, there is no reason we can't add it. in addition to the URL, we will also need to get a secret with the credentials to connect to postgres. @depouill do you have an example to a secret provided by percona operator? we should probably look at other postgres > operators as well

percona doesn't provision databases by itself, we use crossplane as database provisionner. Crossplane provides secret like this:

kind: Secret
apiVersion: v1
metadata:
  name: noobaa-psql
data:
  endpoint: bm9vYmFhLnBlcmNvbmEtcG9zdGdyZXNxbC5zdmMuY2x1c3Rlci5sb2NhbA==
  password: MUc4Y1d4bXpHUXNzSE9VMzk2VFNON2UzTUd1
  port: NTQzMg==
  username: bm9vYmFh

But users may provide Secrets with other attribute names. At least, if it is possible to provide a secret to configure an external postgres instance, whatever attributes names are, it will be very useful.

depouill avatar May 02 '22 08:05 depouill

this looks like a reasonable format we can support to get the details for an external Postgres. we can add a property in noobaa CR - externalDBSecret to refer to an external Postgres server. if this property exists, the operator should skip the reconciling of the DB, and just pass the details to other pods. @guymguym @liranmauda WDYT?

dannyzaken avatar May 10 '22 09:05 dannyzaken

Hello @dannyzaken @guymguym @liranmauda, With the very very limited understanding of this issue, I am exploring postgres operator, that provides HA, and thinking in a way to launch postgresql service from postgresql operator for noobaa. I am very much sure that you already know about it but did not see a discussion on it. Does that not provide the solution what we are looking for ?

vh05 avatar Aug 04 '22 07:08 vh05