temporal-operator
temporal-operator copied to clipboard
Is it possible to import current cluster to temporal operator ?
we have a production cluster deployed via helm chart, i'd like to migrate to use the operator without any downtime if possible. what would be the best path to do it ?
Hi!
I never did that but it could be doable. Let's try this!
What I could suggest to you:
Create a new TemporalCluster
on a dev cluster (with brand new storage) and fill the spec fields to make the operator generate a configmap that looks like (as much as possible) with the current running cluster configmap (the config generated by the helm chart). The only diff should be database users, password and endpoints. If you have diff on the configmap, please raise them on this issue. Maybe we have missing feature on the operator spec.
Then try to make services managed by the operator to join your existing cluster. To do that, create a new TemporalCluster
with the spec you filled on the dev cluster. The operator will try to configure the database for you. To make it skip the persistence reconciliation, update the TemporalCluster
's status with the following fields:
persistence:
defaultStore:
created: true
schemaVersion: 1.21.2
setup: true
type: postgres
visibilityStore:
created: true
schemaVersion: 1.21.2
setup: true
type: postgres
(update it with the right values). This will make the operator to only deploy the components.
If the services deployed by the operator have successfully joined the current existing cluster you'll be able to uninstall the helm chart.
I have no clue if it could work, let's try this :)
Then try to make services managed by the operator to join your existing cluster. To do that, create a new
TemporalCluster
with the spec you filled on the dev cluster. The operator will try to configure the database for you. To make it skip the persistence reconciliation, update theTemporalCluster
's status
Interesting. We have the similar requirement too. Could you elaborate a bit on the terms and steps?
- "services managed by the operator" => suppose it refers to the deployment, configmap built by operator from the CRD
- "join" => not sure what exactly it means. Does it mean the operator take ownership of the helm chart deployed resources and reconcile them? Or additional connection to the database used by helm chart in parallel? Or some kind of multi-cluster joining?
- "
TemporalCluster
's status" => https://alexandrevilain.github.io/temporal-operator/api/v1beta1/#temporal.io/v1beta1.TemporalClusterStatus ?
One known issue during our previous attempt to take ownership of existing database is the conflict on cluster meta info which seems to be a checksum and hard to reverse engineering to the source. The workaround was deleting it and let temporal regenerate it from the configmap created by operator.
Hi @mfractal @alexandrevilain We have the similar requirement. I was gonna try this.
generate a configmap that looks like (as much as possible) with the current running cluster configmap (the config generated by the helm chart)
But in our production cluster's clusterMetadata config , the Cluster Name is "active" , the value of Helm chart default : )
clusterMetadata:
enableGlobalDomain: false
failoverVersionIncrement: 10
masterClusterName: "active"
currentClusterName: "active"
clusterInformation:
active:
enabled: true
initialFailoverVersion: 1
rpcName: "temporal-frontend"
rpcAddress: "127.0.0.1:7933"
Looks like it's impossible to make those two config fits. Since the clusterMetadata config is auto generated by operator and can't config now. Unless we set our cluster name to "active".
In our solution, there will be a downtime.But it works.
- Scale down production to zero replicas
- Deploy temporal cluster with operator using the same prod db
It will take over the running and closed workflow executions.
If you got panic: Cluster info initial versions have duplicates with the new deployment. The reason is below image.
You can bypass it by deleting the old clusterMetadata which is stored in the table cluster_metadata_info of default DB.
Hi! Good news, I think that https://github.com/alexandrevilain/temporal-operator/pull/494 would help you :)