PGUpgrade failing
Please ensure you do the following when reporting a bug:
- [ ] Provide a concise description of what the bug is.
- [ ] Provide information about your environment.
- [ ] Provide clear steps to reproduce the bug.
- [ ] Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
- [ ] Ensure any code / output examples are properly formatted for legibility.
Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTERNAME>/pg_log directory on your Postgres instance.
An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.
If you are looking for general support, please view the support page for where you can ask questions.
Thanks for reporting the issue, we're looking forward to helping you!
Overview
Pgupgrade fail when upgrading from postgres 13 to 16. Images used: crunchy-postgres:ubi8-13.8-1 crunchy-postgres:ubi8-16.3-1
Environment
Please provide the following details:
- Platform: (
Kubernetes,OpenShift,Rancher,GKE,EKS,AKSetc.) Kubernetes - Platform Version: (e.g.
1.20.3,4.7.0) v1.25.16 - PGO Image Tag: (e.g.
ubi8-5.x.y-0) ubi8-5.6.0-0 - Postgres Version (e.g.
15) 13 -> 16 - Storage: (e.g.
hostpath,nfs, or the name of your storage class) nfs
Steps to Reproduce
- create a cluster with: crunchy-postgres:ubi8-13.8-1
- follow doc for upgrading: https://access.crunchydata.com/documentation/postgres-operator/latest/guides/major-postgres-version-upgrade
- pgupgrade failed log:
Performing PostgreSQL upgrade from version "13" to "16" ...
Step 1: Making new pgdata directory...
Step 2: Initializing new pgdata directory...
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are enabled.
fixing permissions on existing directory /pgdata/pg16 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
sh: line 1: 29 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 31 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 33 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 35 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 37 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
selecting default max_connections ... 20
sh: line 1: 39 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 41 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 43 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 45 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 47 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 49 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 51 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 53 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 55 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 57 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 59 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 61 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 63 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 65 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 67 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 69 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 71 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 73 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 75 Bus error (core dumped) "/usr/pgsql-16/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
selecting default shared_buffers ... 400kB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/pgsql-16/bin/pg_ctl -D /pgdata/pg16 -l logfile start
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
Step 3: Setting the expected permissions on the old pgdata directory...
Step 4: Copying shared_preload_libraries setting to new postgresql.conf file...
upgrade: line 18: /usr/pgsql-13/bin/postgres: No such file or directory
Step 5: Running pg_upgrade check...
check for "/usr/pgsql-13/bin" failed: No such file or directory
Failure, exiting
real 0m0.039s
user 0m0.004s
sys 0m0.008s
New postgres 16 cannot be started log:
k logs pgsql-ngt-test-pgha1-54bb-0 -n database-services
Defaulted container "database" out of: database, replication-cert-copy, postgres-startup (init), nss-wrapper-init (init)
2024-09-19 10:52:08,477 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-09-19 10:52:08,629 WARNING: Postgresql is not running.
2024-09-19 10:52:08,629 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:08,637 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202307071
Database system identifier: 7416304783840694348
Database cluster state: shut down
pg_control last modified: Thu Sep 19 10:50:25 2024
Latest checkpoint location: 0/186DFB0
Latest checkpoint's REDO location: 0/186DFB0
Latest checkpoint's REDO WAL file: 000000010000000000000001
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:730
Latest checkpoint's NextOID: 13545
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 722
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Thu Sep 19 10:50:25 2024
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 20
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 2a28ce363859f2cc621fe10e6b20e7038c6578ba4329e42f3d081523c1b6cd12
2024-09-19 10:52:08,640 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:08,803 INFO: starting as a secondary
2024-09-19 10:52:09,889 INFO: postmaster pid=105
/tmp/postgres:5432 - no response
2024-09-19 10:52:09.920 UTC [105] LOG: redirecting log output to logging collector process
2024-09-19 10:52:09.920 UTC [105] HINT: Future log output will appear in directory "log".
2024-09-19 10:52:10,906 ERROR: postmaster is not running
2024-09-19 10:52:18,514 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:18,641 INFO: failed to start postgres
2024-09-19 10:52:28,515 WARNING: Postgresql is not running.
2024-09-19 10:52:28,515 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:28,525 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202307071
Database system identifier: 7416304783840694348
Database cluster state: shut down
pg_control last modified: Thu Sep 19 10:50:25 2024
Latest checkpoint location: 0/186DFB0
Latest checkpoint's REDO location: 0/186DFB0
Latest checkpoint's REDO WAL file: 000000010000000000000001
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:730
Latest checkpoint's NextOID: 13545
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 722
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Thu Sep 19 10:50:25 2024
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 20
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 2a28ce363859f2cc621fe10e6b20e7038c6578ba4329e42f3d081523c1b6cd12
2024-09-19 10:52:28,526 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:28,527 INFO: starting as a secondary
2024-09-19 10:52:29,565 INFO: postmaster pid=176
2024-09-19 10:52:29.573 UTC [176] LOG: redirecting log output to logging collector process
2024-09-19 10:52:29.573 UTC [176] HINT: Future log output will appear in directory "log".
/tmp/postgres:5432 - no response
2024-09-19 10:52:30,585 ERROR: postmaster is not running
2024-09-19 10:52:38,514 INFO: Lock owner: None; I am pgsql-ngt-test-pgha1-54bb-0
2024-09-19 10:52:38,515 INFO: failed to start postgres
When running with PG13 image it also refuses now to start, but was healthy before upgrade:
Defaulted container "database" out of: database, replication-cert-copy, postgres-startup (init), nss-wrapper-init (init)
2024-09-19 10:54:23,103 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-09-19 10:54:23,118 CRITICAL: system ID mismatch, node pgsql-ngt-test-pgha1-54bb-0 belongs to a different cluster: 7416304783840694348 != 7174435655098040412
REPRO
above
EXPECTED
pgupgrade complete without issues
ACTUAL
pgupgrade failed, and postgres cluster not starting at all with new version and with the old one.
Logs
Above
Additional Information
na
for some additional info database with postgresql13 was initialized on host without hugepages, now it is running on host with hugepages enabled. When i scheduled the pgupgrade on host without hugepages there are no bus errors anymore although still upgrade fail.
Performing PostgreSQL upgrade from version "13" to "16" ...
Step 1: Making new pgdata directory...
Step 2: Initializing new pgdata directory...
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are enabled.
fixing permissions on existing directory /pgdata/pg16 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
/usr/pgsql-16/bin/pg_ctl -D /pgdata/pg16 -l logfile start
Step 3: Setting the expected permissions on the old pgdata directory...
Step 4: Copying shared_preload_libraries setting to new postgresql.conf file...
upgrade: line 18: /usr/pgsql-13/bin/postgres: No such file or directory
Step 5: Running pg_upgrade check...
check for "/usr/pgsql-13/bin" failed: No such file or directory
Failure, exiting
real 0m0.070s
user 0m0.000s
sys 0m0.009s
even running clean cluster on host without hugepages and trying pgupgrade on it also running pgupgrade on non hugapes host. It just fail. Same log as above.
i was able to perform uprade using crunchy-upgrade:ubi8-5.5.0-0 but on 5.6.0 and 5.6.1 upgrade fails.
So I think it is still something that should be addressed. Also these bus errors when running upgrade on host with hugepages.
did you tried a step by step upgrade? 13 > 14 > 15 > 16
I just tried to upgrade 15 to 16, this worked fine with PGO 5.6.1
I tried to 15 and it failed. Pgsql 14 is not avaiable any more in repo. As i do not have the image for it i cannot test upgrade. 15->16 for sure will work. So that is not really helpful answer for that case.
@todeb Sorry to hear you hit this issue. From the logs you provided, I think the reason you are having this issue is due to the 5.6.0 upgrade image not having the required Postgres 13 binary whereas the 5.5.0 image does includes it. You can read more about this in our developer program container versions documentation. For your current scenario, I would therefore recommend using the 5.5.0 image for this upgrade, then moving to the newer images in the future. I hope this helps!
It is what i did. So if that is not issue but expected behavior that you cannot upgrade from older postgresql version having newer pgo and you need downgrade pgo first you can then close it.