postgres-operator Proposal: PostgresEngineConfiguration and Breaking changes

Hello @arnarg and @hitman99 ,

I want to work on a feature to add another CRD called PostgresEngineConfiguration. This one will allow to manage Postgres databases and users on multiple engines and not only one like now. One of this new CR can be set with a "default: true" flag. This will create a breaking change according to the point that configuration set by environment variables are not used anymore.

With this change, I also purpose to rename Postgres CRD into PostgresDatabase CRD. Why ? Because having: PostgresEngineConfiguration <= Postgres <= PostgresUser is weird to my point of view and maybe not clear for people/teams using this operator.

This will lead to 2 breaking changes. Maintaining 2 different ways of doing stuff will be difficult and will conduct to difficult debug sessions...

Is it ok for you to do it like that ?

Feb 10 '20 18:02 oxyno-zeta

Hi @oxyno-zeta,

This one will allow to manage Postgres databases and users on multiple engines

Do you mean Postgres on Azure and on AWS at the same time (with single operator)?

With this change, I also purpose to rename Postgres CRD into PostgresDatabase CRD This will lead to 2 breaking changes.

This seems risky and will break already deployed CRs causing havoc.

Can you elaborate on the actual use case and purpose of this proposal?

Feb 10 '20 18:02 hitman99

Hello @hitman99 ,

Do you mean Postgres on Azure and on AWS at the same time (with single operator)?

I mean to manage multiple engines at the same time with the same operator. Postgres_1, Postgres_2, AWS_RDS_1, AWS_RDS_2, ... into the same operator instance.

This seems risky and will break already deployed CRs causing havoc.

Yes I know... But calling a CR Postgres is confusing for some people. For example, if you have the KubeDB operator which deploys Postgres engines with a CR called Postgres (see here) AND postgres-operator which manage Postgres Databases with a CR called Postgres, it is confusing... I think you see what I mean.

That's why I purpose to make the 2 breaking changes at the same time.

Or maybe another CRD for PostgresDatabase and keep Postgres CRD in a deprecated mode without any update if it is possible, no ? If not, can we do the break ?

Feb 10 '20 18:02 oxyno-zeta

mean to manage multiple engines at the same time with the same operator. Postgres_1, Postgres_2, AWS_RDS_1, AWS_RDS_2, ... into the same operator instance.

I have multiple concerns for this change:

it will drastically increase the logic complexity and introduce new bugs / edge cases
it will expand the blast radius to multiple Postgres database servers instead of one, potentially rendering all of them inaccessible at once in case there is some issue with the operator
it will provide this operator with multiple credentials to different systems, which is very concerning from the security point of view

I'd recommend using multiple namespaced instances of postgres-operator for controlling multiple Postgres servers. It should work as-is without any changes.

Yes I know... But calling a CR Postgres is confusing for some people. For example, if you have the KubeDB operator which deploys Postgres engines with a CR called Postgres (see here) AND postgres-operator which manage Postgres Databases with a CR called Postgres, it is confusing... I think you see what I mean.

I see what might cause confusion, but the semantics heavily depends on the object (CR) context perception. This iperator does not claim to be creating Postgres database servers, it's an external postgres operator. Currently, db.movetokube.com api group has two object kinds in version v1alpha1:

Postgres - which refers to the Postgres database and not the Postgres database server (or engine, if you will) and I think this name is valid in this operator's context
PostgresUser - which refers to a login role for the database created by Postgres CR

In KubeDB operator's case, Postgres refers to the database server that it creates and that name is valid for that object Kind.

Feb 10 '20 19:02 hitman99

I'd recommend using multiple namespaced instances of postgres-operator for controlling multiple Postgres servers. It should work as-is without any changes.

The problem of this is the following: Kubernetes didn't allow to mount secrets from another namespace. So if we install the operator in namespace postgres1 and applications in other namespaces like apps1, apps2, ... (we don't know namespaces as they are not fixed), we can't install a second operator instance in postgres2 with another watch all namespace. 2 operators with watch all namesapces twice will create a problem. That's why I purpose this feature.

Feb 10 '20 20:02 oxyno-zeta

The operator can be configured to watch specific namespace for CRs. Secrets with credentials are created in the same namespace that CR is created, it does not depend on the operator's namespace. SO the idea is to have multiple deployments of the operator watching different namespaces.

Feb 11 '20 07:02 hitman99

SO the idea is to have multiple deployments of the operator watching different namespaces.

I've understood that. The problem isn't here. The problem appears when you don't know the list of namespaces. For some cases, developers can create their namespaces with random names and we can't know what they will put. Having multiple operator instances with a watch all namespaces can't work, having an operator per namespace created too and knowing the namespace list too. That's why I purpose this feature.

Another solution can be a label filter if you don't want it.

Another reason why I purpose that feature: If the Postgres engine isn't ready when the operator is deployed, the operator will crash with a fatal error. With this feature, we can just put events in Kubernetes and put a clear status on the PostgresEngineConfiguration saying: This configuration isn't valid and don't crash loop.

Feb 11 '20 08:02 oxyno-zeta

I've understood that. The problem isn't here. The problem appears when you don't know the list of namespaces. For some cases, developers can create their namespaces with random names and we can't know what they will put. Having multiple operator instances with a watch all namespaces can't work, having an operator per namespace created too and knowing the namespace list too. That's why I purpose this feature.

Ah, I understand your use case now. In this particular case it makes sense.

Another reason why I purpose that feature: If the Postgres engine isn't ready when the operator is deployed, the operator will crash with a fatal error. With this feature, we can just put events in Kubernetes and put a clear status on the PostgresEngineConfiguration saying: This configuration isn't valid and don't crash loop.

This is valid case for the new feature-set that you're proposing, but not entirely true for current implementation. Currently Postgres database server is a critical dependency for this operator, without it the operator is rendered useless and cannot deliver anything of value. But on the other hand, graceful handling of missing dependency might be nice.

Now that I think about this proposal more it might make sense, but it's a major overhaul and we would need to come up with the migration plan for the existing users. Introducing breaking changes is always a stressful thing for the users. What do you think about this proposal @arnarg ?

Feb 11 '20 09:02 hitman99

Now that it's been explained a bit better I think it makes sense. I agree that we should be careful about introducing new changes. I'm up for the change, we just have to make sure at least the password can be picked up from a secret.

Feb 12 '20 08:02 arnarg

I've began the work on this and we would like to have another type of status in objects. Is it possible for you two (@hitman99 @arnarg ) to accept drastic change on code base ? Because we want to add events, phase status, updated time, ... This will change a lot of things. We will try to minimize impacts on users but code base will be impacted.

Is it ok ?

Here is an example of status:

type StatusPhase string

const NonePhase StatusPhase = ""
const ValidatingPhase StatusPhase = "validating"
const FailedPhase StatusPhase = "failed"
const ValidatedPhase StatusPhase = "validated"
const RemovingPhase StatusPhase = "removing"

// PostgresqlEngineConfigurationStatus defines the observed state of PostgresqlEngineConfiguration
type PostgresqlEngineConfigurationStatus struct {
	// Current phase of the operator
	Phase StatusPhase `json:"phase"`
	// Human-readable message indicating details about current operator phase or error.
	Message string `json:"message"`
	// True if all resources are in a ready state and all work is done.
	Ready bool `json:"ready"`
	// Last Updated time
	LastUpdatedTime string `json:"updatedDate"`
	// Last validated time
	LastValidatedTime string `json:"lastValidatedDate"`
}

@arnarg : Of course the user and password will be in a secret

Feb 12 '20 09:02 oxyno-zeta

I like the status struct and the other CRDs should probably have something similar. I'm however not sure what all the phases mean for this CRD.

Will there be a controller specific to this CRD and some communication channel for Postgres and PostgresUser controllers or is this CRD just an data object Postgres and PostgresUser controllers will pick up to create connections.

Feb 12 '20 09:02 arnarg

@arnarg This will just be a data object that other CR will pick up to create connections and will have a dedicated controller to check that configuration is good.

Is it ok for you to change the source code to add all of these changes ?

Feb 12 '20 10:02 oxyno-zeta

At the end, we would like to add a feature to do a password update for user based on an interval. This is the last feature we would like to add to this operator.

Feb 12 '20 10:02 oxyno-zeta

The CRD looks ok, so just to be on the same page: what will it do exactly? How will this work with the credentials to multiple postgres servers/clusters? Also, what so these phases mean? Will it reflect this new CR's changes?

As for events, for which CRs are you going to implement it?

Feb 12 '20 10:02 hitman99

@hitman99 : This CR is only a way to store multiple engines connection informations Phases are for people to have a "human friendly" status because something a boolean isn't sufficient.

For events, we are talking about everything.

Feb 12 '20 14:02 oxyno-zeta

Any updates on this?

Mar 09 '20 06:03 hitman99

For some cases, developers can create their namespaces with random names and we can't know what they will put. Having multiple operator instances with a watch all namespaces can't work, having an operator per namespace created too and knowing the namespace list too. That's why I purpose this feature.

This would be cool. Alternatively, I think we can introduce annotations that ext-postgres-operator will watch for. For example, I imagine having ext-postgres-operator/secret-name: cluster-0 annotation for Postgres and PostgresUser objects, making the operator manage the DB and Role in just the cluster described in that Secret.

I'd love to manage multiple clusters with ext-postgres-operator.

Sep 20 '21 23:09 kdasme

postgres-operator postgres-operator copied to clipboard

Proposal: PostgresEngineConfiguration and Breaking changes

postgres-operator
postgres-operator copied to clipboard