helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[Bug] Persistence and visibility schema create/update should have separate toggles

Open RonaldGalea opened this issue 2 years ago • 9 comments

What are you really trying to do?

Deploy Temporal with an existing Database.

Describe the bug

When an exiting DB is used, the suggested configuration is disabling the schema setup and update:

schema:
  setup:
    enabled: false
  update:
    enabled: false

However, this causes the the index creation job not to run either: https://github.com/temporalio/helm-charts/blob/master/templates/server-job.yaml#L347

There should likely be a separate flag controlling the ElasticSearch index creation.

Minimal Reproduction

Just run any of the "Install with your own MySQL/PostgreSQL/Cassandra" examples. All server services will be stuck in Init "waiting for elasticsearch index to become ready"

Additional context

There is this post on the community forum which might be related.

RonaldGalea avatar Nov 22 '23 11:11 RonaldGalea

I stumbled upon this today while trying to configure temporal with an external PostgreSQL DB. I also found a workaround, but it isn't pretty.

We're deploying this through an ArgoCD application, and this is what it looks like:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
    name: temporal
    namespace: argocd
spec:
    syncPolicy:
        automated:
            selfHeal: true
            prune: true
    project: default
    destination:
        server: https://kubernetes.default.svc
        namespace: temporal
    source:
        path: charts/temporal
        repoURL: https://github.com/temporalio/helm-charts
        targetRevision: temporal-0.33.0
        helm:
            releaseName: temporal
            values: |-
                replicaCount: 1
                postgresql:
                  enabled: true
                prometheus:
                  enabled: true
                elasticsearch:
                  enabled: true
                grafana:
                  enabled: true
                cassandra:
                  enabled: false
                schema:
                  setup:
                    enabled: true
                  update:
                    enabled: true
                server:
                  config:
                    persistence:
                      default:
                        driver: sql
                        sql:
                          driver: postgres
                          host: temporal(...).eu-west-1.rds.amazonaws.com
                          port: 5432 #if you don't specify this, temporal defaults to port 3306 for postgresql, which is the default port for mysql!
                          user: postgresql
                          password: xxxx
                      visibility:
                        driver: sql
                        sql:
                          driver: postgres
                          host: temporal(...).eu-west-1.rds.amazonaws.com
                          port: 5432
                          user: postgresql
                          password: xxxx

All 4 temporal pods were stuck initializing. I only checked the worker pod which was was failing with:

waiting for elasticsearch index to become ready

This is due the es-index-setup job not being created.

I ended up cloning the repo, checking out the tag I was using above, putting the helm values on a file by themselves and templating the chart locally:

helm template temporal /home/daniel/repos/temporal-helm-charts/charts/temporal/ --values temporal-values.yml

And strangely enough this generated the yaml for the es-index-setup Job, which I then kubectl applied from my machine, which initialized temporal's elastic search instance, and now the pods are OK.

I ran out of time to troubleshoot why the helm chart has this strange behaviour, if it wasn't for this issue I would assume the problem was between my chair and keyboard, but now I'm not so sure.

Also it is worth bearing in mind that many temporal users will use a GitOps tool (likely ArgoCD or FluxCD) to deploy this helm chart, so it is also something worth validating.

Cheers

DanielCalvo avatar Feb 09 '24 17:02 DanielCalvo

same issue is happening to me, have pods stuck on init state. I tried the workaround from @DanielCalvo but it not work on my case.

max-openline avatar Feb 22 '24 15:02 max-openline

Do you happen to have any updates on this? Our team is also affected by this issue.

emanuel8x8 avatar May 07 '24 09:05 emanuel8x8

I seem to also have this issue upgrading from 0.36.0 to 0.37.0. It is my first upgrade.

Scott

smolinari avatar May 09 '24 17:05 smolinari

I also had this issue; after a Kubernetes update I realized my Temporal deployment wasn't in a good state. I found that the temporal-history container was waiting on this same elasticsearch index setup. Following along the lines of above, in my temporal helm chart repo I set enabled: true in my values/values.postgresql.yaml (fixing the config OP pointed out), then I did:

helm template . --values values/values.postgresql.yaml > out.yml

Then you can search for temporal-es-index-setup in that output, stick that job into it's own job.yml file and do:

kubectl apply -f job.yml

I had to kick a few of the pods to restart the deployment but then everything was working as normal.

brojonat avatar Jul 04 '24 20:07 brojonat

If you are using ArgoCD:

  • go into temporal ArgoCD app, remove resource "job" - "temporal-schema"
  • it will re-create job that takes care of es-index setup
  • watch progress of that job and look at any errors - in my case there was a problem connecting to Cassandra headless so I restarted pod for Cassandra

all working fine.

bart-braidwell avatar Jul 26 '24 10:07 bart-braidwell

i ran into this and added

schema:
  update:
    enabled: true

to my chart. not sure if this is wrong though

elee1766 avatar Nov 19 '24 20:11 elee1766

For me, what helped is just running helm upgrade [chart-name] .. Helm then runs the jobs again that made all pods running correctly.

Timonsc avatar Feb 06 '25 14:02 Timonsc

The issue isn't with Elasticsearch specifically, but the fact there are not separate controls for persistence vs visibility.

robholland avatar Feb 24 '25 09:02 robholland