postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Upgrading to pgo 5.8.4 and pg18 fails to build replication due to missing scripts

Open EugenMayer opened this issue 2 months ago • 1 comments

Please ensure you do the following when reporting a bug:

  • [x] Provide a concise description of what the bug is.
  • [x] Provide information about your environment.
  • [x]Provide clear steps to reproduce the bug.
  • [x] Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
  • [x] Ensure any code / output examples are properly formatted for legibility.

Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTERNAME>/pg_log directory on your Postgres instance.

An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.

If you are looking for general support, please view the support page for where you can ask questions.

Thanks for reporting the issue, we're looking forward to helping you!

Overview

After upgrading to pgo 5.8.4 and to pg 18 the replicas fail to bootstrap. They are all in the state

2025-11-09 07:45:21,540 INFO: bootstrap from leader 'pg-cluster-pg-instance-v4-s4h9-0' in progress
2025-11-09 07:45:21,541 INFO: Lock owner: pg-cluster-pg-instance-v4-s4h9-0; I am pg-cluster-pg-instance-v4-chhc-0
2025-11-09 07:45:21,541 INFO: bootstrap from leader 'pg-cluster-pg-instance-v4-s4h9-0' in progress

While the operator displays:

time="2025-11-09T07:32:16Z" level=error msg="Query file /opt/crunchy/conf/pg18/queries_general.yml does not exist (it should)..." PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="open /opt/crunchy/conf/pg18/queries_general.yml: no such file or directory" name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=error msg="Query file /opt/crunchy/conf/pg18/queries_pg_stat_statements.yml not loaded." PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="open /opt/crunchy/conf/pg18/queries_pg_stat_statements.yml: no such file or directory" name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=error msg="Query file /opt/crunchy/conf/pg18/queries_pg_stat_statements_reset_info.yml not loaded." PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="open /opt/crunchy/conf/pg18/queries_pg_stat_statements_reset_info.yml: no such file or directory" name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=debug msg="reconciled instance" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster instance=pg-cluster-pg-instance-v4-chhc name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=debug msg="reconciled instance" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster instance=pg-cluster-pg-instance-v4-4h54 name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=debug msg="reconciled instance" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster instance=pg-cluster-pg-instance-v4-s4h9 name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:16Z" level=debug msg="reconciled instance set" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster instance-set=pg-instance-v4 name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:17Z" level=debug msg="reconciled cluster" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
time="2025-11-09T07:32:17Z" level=error msg="Reconciler error" PostgresCluster=postgres/pg-cluster controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="open /opt/crunchy/conf/pg18/setup.sql: no such file or directory" name=pg-cluster namespace=postgres reconcileID=17762c55-9599-45ce-b09c-4fe752c93745 version=5.8.4-0
t

All my clusters that only run replica 1 work without issues, since there are no replicas.

My master is properly boostrapped as pg18 with the image registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi9-18.0-2542

KRB5RCACHEDIR : /tmp
KRB5_CONFIG : /etc/postgres/krb5.conf
LDAPTLS_CACERT : /etc/postgres/ldap/ca.crt
PGDATA : /pgdata/pg18
PGHOST : /tmp/postgres
PGPORT : 5432

Environment

  • Platform: k8s (rke2)
  • Platform Version: 1.33.5
  • PGO Image Tag: pgo: registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.4-0 - pg: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi9-18.0-2542
  • Postgres Version: 18
  • Storage: (e.g. hostpath, nfs, or the name of your storage class)

Steps to Reproduce

REPRO

In a cluster of at least repliace 2:

  • upgrade (maybe just spwan?) pgo to 5.8.4 using https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/install (in my case with a helm chart)
  • upgrade pg from 17 to 18
  • no the master should build up while the slave should be in pending mode

EXPECTED

Slaves should replicate

ACTUAL

all slaves fail to replicate

Logs

see above under overview

Additional Information

EugenMayer avatar Nov 09 '25 07:11 EugenMayer

It seems that the issue is on the operator. My current operation runs registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.4-0 - which is correct as by Which by https://github.com/CrunchyData/postgres-operator-examples/blob/81bdf8b7000149422d2b12dd8c85cef7ae93093f/helm/install/values.yaml#L5 is correct. Did someone at crunchy probably forget to update the operator?

Still this operator has no pg18 folder

bash-5.1$ ls /opt/crunchy/conf/ -la
total 108
drwxr-xr-x 9 root root  4096 Oct 14 18:31 .
drwxr-xr-x 3 root root  4096 Oct 14 18:31 ..
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg11
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg12
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg13
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg14
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg15
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg16
dr-xr-xr-x 2 root root  4096 Oct 14 18:18 pg17
-r--r--r-- 1 root root  8866 Oct 14 18:18 queries_backrest.yml
-r--r--r-- 1 root root   950 Oct 14 18:18 queries_bloat.yml
-r--r--r-- 1 root root 14412 Oct 14 18:18 queries_global.yml
-r--r--r-- 1 root root   495 Oct 14 18:18 queries_global_dbsize.yml
-r--r--r-- 1 root root   433 Oct 14 18:18 queries_global_matview.yml
-r--r--r-- 1 root root  7749 Oct 14 18:18 queries_nodemx.yml
-r--r--r-- 1 root root  2929 Oct 14 18:18 queries_per_db.yml
-r--r--r-- 1 root root  2795 Oct 14 18:18 queries_per_db_matview.yml
-r--r--r-- 1 root root  4684 Oct 14 18:18 queries_pgbouncer.yml
-r--r--r-- 1 root root  5295 Oct 14 18:18 setup_metric_views.sql

you can reporduce this by runnin

docker run -it registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi9-5.8.4-0 ls /opt/crunchy/conf/ -la

Is it possible that the PGO operator has been wrongly released or maybe a new version is missing?

EugenMayer avatar Nov 20 '25 18:11 EugenMayer