noobaa-operator
noobaa-operator copied to clipboard
Feature request: Make noobaa-db highly available
Is your feature request related to a problem? Please describe. Currently there is only one noobaa-db pod. If this pod dies, for whatever reason, the noobaa instance is no longer responsive until the pod restarts, which may take quite some time depending on your storageclass.
Describe the solution you'd like There should be a option in the NooBaa CR to automatically create a HA setup of the noobaa-db statefulset. For the currently used MongoDB this can easily be done using a MongoDB replicaset (see below for example). Also when this gets merged the MongoDB URL can easily be changed.
Describe alternatives you've considered A other option would be that the user provides a HA instance of MongoDB this would also require above PR to be merged. But when it comes to cluster external MongoDB instances the tight timeouts set in noobaa-core could raise a problem (@guymguym, quintin said you told him this could be problematic). Also it would be way more userfriendly if the user could easily upgrade to a real HA setup of noobaa (AFAIK the core does not need to be HA as it only handles the UI).
Additional context
Currently we are testing a HA setup using mongo:3.6.21
in cluster. The k8s service for this is the same as with the normal non-HA noobaa-db. I changed the statefulset to this:
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: noobaa-db
annotations:
argocd.argoproj.io/sync-wave: "-2"
labels:
app: noobaa
spec:
replicas: 3
selector:
matchLabels:
noobaa-db: noobaa
template:
metadata:
labels:
app: noobaa
noobaa-db: noobaa
spec:
serviceAccountName: noobaa
containers:
- name: db
image: 'mongo:3.6.21'
command:
- bash
- '-c'
- >-
mkdir -p /data/mongo/cluster/shard1 &&
mongod --port 27017 --bind_ip_all --dbpath
/data/mongo/cluster/shard1 --replSet rs0
volumeMounts:
- name: db
mountPath: /data
serviceAccount: noobaa
dnsPolicy: ClusterFirst
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: db
labels:
app: noobaa
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: block
volumeMode: Filesystem
serviceName: noobaa-db
To activate the replicaset on all db pods i created a simple job:
apiVersion: batch/v1
kind: Job
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-1"
name: mongo-init
spec:
backoffLimit: 10
template:
spec:
volumes:
- name: cache-volume
emptyDir: { }
containers:
- name: mongo-init
image: 'mongo:3.6.21'
command:
- bash
- '-c'
- >-
( echo 'rs.initiate({_id: "rs0",version: 1,members:
[{ _id: 0, host : "noobaa-db-0.noobaa-db" },
{ _id: 1, host : "noobaa-db-1.noobaa-db" },
{ _id: 2, host : "noobaa-db-2.noobaa-db" }]});'
> /data/tmp/init.js ) &&
mongo mongodb://noobaa-db-0.noobaa-db /data/tmp/init.js &&
mongo mongodb://noobaa-db-1.noobaa-db /data/tmp/init.js &&
mongo mongodb://noobaa-db-2.noobaa-db /data/tmp/init.js
volumeMounts:
- mountPath: /data/tmp
name: cache-volume
restartPolicy: OnFailure
The initialization the job does could also be done in the operator using an exec command to every noobaa-db pod.
Note: The ArgoCD sync waves could be handled by the operator, so first ensure the db is up and running before deploying the core.
@dannyzaken Can you take a look and see if this can be used for our HA requirements?
Hey folks, is there any progress on this task? We've recently run into precisely this issue on one of our baremetal clusters when a node became unresponsive and the db pod was stuck in a terminating state because of that. It made the whole Noobaa unresponsive. If the db and whole Noobaa was HA, our single node failure wouldn't affect Openshift Container Storage accessibility via Noobaa...
@liranmauda as you are thinking about removing MongoDB in favor of PostgreSQL this may be changed to HA for noobaa-db-pg. This i think is quite important as anybody using noobaa in production would want a HA setup. Alternatively it would be a good start to resolve #543 so anybody can easily provide his own postgres instance.
It would be a nice feature to provide a custom posgresql service (like a percona one) by providing custom posgtresql secrets and service name and switch off postgresql statefulset.
Thanks @depouill for the feedback!
@dannyzaken any reason why we can't add an optional PG url in the spec like we added for mongo before?
https://github.com/noobaa/noobaa-operator/blob/b3d79c3eeca36bf45f89e79b9587b8f4d4a9c043/pkg/apis/noobaa/v1alpha1/noobaa_types.go#L117-L119
@guymguym, there is no reason we can't add it.
in addition to the URL, we will also need to get a secret with the credentials to connect to postgres.
@depouill do you have an example to a secret provided by percona operator? we should probably look at other postgres operators as well
@guymguym, there is no reason we can't add it. in addition to the URL, we will also need to get a secret with the credentials to connect to postgres. @depouill do you have an example to a secret provided by percona operator? we should probably look at other postgres > operators as well
percona doesn't provision databases by itself, we use crossplane as database provisionner. Crossplane provides secret like this:
kind: Secret
apiVersion: v1
metadata:
name: noobaa-psql
data:
endpoint: bm9vYmFhLnBlcmNvbmEtcG9zdGdyZXNxbC5zdmMuY2x1c3Rlci5sb2NhbA==
password: MUc4Y1d4bXpHUXNzSE9VMzk2VFNON2UzTUd1
port: NTQzMg==
username: bm9vYmFh
But users may provide Secrets with other attribute names. At least, if it is possible to provide a secret to configure an external postgres instance, whatever attributes names are, it will be very useful.
this looks like a reasonable format we can support to get the details for an external Postgres.
we can add a property in noobaa CR - externalDBSecret
to refer to an external Postgres server. if this property exists, the operator should skip the reconciling of the DB, and just pass the details to other pods.
@guymguym @liranmauda WDYT?
Hello @dannyzaken @guymguym @liranmauda, With the very very limited understanding of this issue, I am exploring postgres operator, that provides HA, and thinking in a way to launch postgresql service from postgresql operator for noobaa. I am very much sure that you already know about it but did not see a discussion on it. Does that not provide the solution what we are looking for ?