cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

[Bug] unstable pg_upgrade failed

Open avamingli opened this issue 1 year ago • 15 comments

Cloudberry Database version

No response

What happened

We suffer it for a long time

pg_upgrade failed
psql: error: connection to server on socket "/tmp/.s.PGSQL.17432" failed: No such file or directory
[6694](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6695)        Is the server running locally and accepting connections on that socket?
[6695](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6696)======================================================================
[6696](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6697)
[6697](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6698)20231024:08:45:55:017476 gpstop:ip-10-0-1-232:gpadmin-[INFO]:-Starting gpstop with args: -a
[6698](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6699)20231024:08:45:55:017476 gpstop:ip-10-0-1-232:gpadmin-[INFO]:-Gathering information and validating the environment...
[6699](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6700)Error: 4:08:45:55:017476 gpstop:ip-10-0-1-232:gpadmin-[ERROR]:-gpstop error: postmaster.pid file does not exist.  is Cloudberry instance already stopped?
[6700](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6701)/code/gpdb_src/src/bin/pg_upgrade/tmp_check/upgrade/qd /code/gpdb_src/src/bin/pg_upgrade
[6701](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6702)Performing Consistency Checks
[6702](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6703)-----------------------------
[6703](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6704)Checking cluster versions                                   ok
[6704](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6705)
[6705](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6706)The target cluster was not shut down cleanly.
[6706](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6707)Failure, exiting
[6707](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6708)
[6708](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6709)ERROR: Failure encountered in upgrading qd node
[6709](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6710)real        0m0.050s
[6710](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6711)user        0m0.019s
[6711](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6712)sys        0m0.030s
[6712](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6713)/code/gpdb_src/src/bin/pg_upgrade /code/gpdb_src/src/bin/pg_upgrade/tmp_check/upgrade/qd /code/gpdb_src/src/bin/pg_upgrade
[6713](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6714)make[1]: *** [Makefile:78: check] Error 1
[6714](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6715)make: *** [GNUmakefile:194: installcheck-world-src/bin/pg_upgrade-recurse] Error 2
[6715](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6716)make: Target 'installcheck-world' not remade because of errors.
20231024:08:35:54:031540 gpstart:ip-10-0-1-232:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=01d1134bbbff0ed5_231024083553 $GPHOME/bin/pg_ctl -D /code/gpdb_src/src/bin/pg_upgrade/tmp_check/datadirs/qddir/demoDataDir-1 -l /code/gpdb_src/src/bin/pg_upgrade/tmp_check/datadirs/qddir/demoDataDir-1/log/startup.log -w -t 600 -o " -p 17432 -c gp_role=dispatch " start
[6642](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6643)20231024:08:45:54:031540 gpstart:ip-10-0-1-232:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
[6643](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6644) Command was: 'env GPSESSID=0000000000 GPERA=01d1134bbbff0ed5_231024083553 $GPHOME/bin/pg_ctl -D /code/gpdb_src/src/bin/pg_upgrade/tmp_check/datadirs/qddir/demoDataDir-1 -l /code/gpdb_src/src/bin/pg_upgrade/tmp_check/datadirs/qddir/demoDataDir-1/log/startup.log -w -t 600 -o " -p 17432 -c gp_role=dispatch " start'
[6644](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6645)rc=1, stdout='waiting for server to start........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... stopped waiting
[6645](https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260#step:5:6646)', stderr='pg_ctl: server did not start in time
------------------------

It seems gpstart timeout after switch binary from gpdb5 -> gpdb6

What you think should happen instead

No response

How to reproduce

https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260

Operating System

https://github.com/cloudberrydb/cloudberrydb/actions/runs/6623396719/job/17990808401?pr=260

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

avamingli avatar Oct 25 '23 01:10 avamingli