cloudberry
cloudberry copied to clipboard
[Bug] count(*) query crashed and db in recovery mode
Cloudberry Database version
warehouse=# select version(); version
PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-b it compiled on Aug 3 2023 10:15:47 (1 row)
What happened
count(*) query crashed:
# \d+ zzz_admin.aaa
Table "zzz_admin.aaa"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
table_catalog | information_schema.sql_identifier | | | | plain | | |
table_schema | information_schema.sql_identifier | | | | plain | | |
table_name | information_schema.sql_identifier | | | | plain | | |
dt | text | | | | extended | | |
Distributed by: (table_catalog)
Access method: heap
=# select count(*) from zzz_admin.aaa;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q
logs:
2023-09-21 08:40:37.498183 CST,"gpadmin","",p433642,th-891975552,"[local]",,2023-09-21 08:38:56 CST,0,con21392,cmd29,seg-1,,,,sx1,"LOG","00000","statement: select count(*) from zzz_admin.aaa;",,,,,,,0,,"postgres.c",1708,
2023-09-21 08:40:37.551299 CST,,,p433642,th123456,,,2023-09-21 08:38:56 CST,0,con21392,cmd29,seg-1,,,,,"PANIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1 0x7fb5c8cc6630 libpthread.so.0 <symbol not found> + 0xc8cc6630
2 0xd5cc6c postgres DirectFunctionCall1Coll (fmgr.c:806)
"
2023-09-21 08:40:37.651110 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 433642) was terminated by signal 11: Segmentation fault","Failed process was running: select count(*) from zzz_admin.aaa;",,,,,,0,,"postmaster.c",4229,
2023-09-21 08:40:37.651133 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","terminating any other active server processes",,,,,,,0,,"postmaster.c",3959,
2023-09-21 08:40:37.652496 CST,,,p75086,th-891975552,,,,0,,,seg-1,,,,,"WARNING","01000","ic-proxy-server: received signal 3",,,,,,,0,,"ic_proxy_main.c",474,
2023-09-21 08:40:37.652519 CST,"gpadmin","",p434360,th-891975552,"[local]",,2023-09-21 08:40:37 CST,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at 1C/27F1B788",,,,,,0,,"postmaster.c",2746,
2023-09-21 08:40:37.652569 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","background worker ""ic proxy process"" (PID 75086) exited with exit code 1",,,,,,,0,,"postmaster.c",4208,
2023-09-21 08:40:37.656272 CST,"gpadmin",,p434361,th-891975552,"","18994",2023-09-21 08:40:37 CST,0,,,seg-1,,,,,"FATAL","57P03","the database system is in recovery mode","last replayed record at 1C/27F1B788",,,,,,0,,"postmaster.c",2746,
2023-09-21 08:40:37.665056 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","all server processes terminated; reinitializing",,,,,,,0,,"postmaster.c",4515,
2023-09-21 08:40:37.831100 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was interrupted; last known up at 2023-09-21 08:38:41 CST",,,,,,,0,,"xlog.c",6816,
2023-09-21 08:40:37.831129 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","Synchronization of the wal directory starts.",,,,,,,0,,"fd.c",3446,
2023-09-21 08:40:37.831191 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","synchronization of the wal directory finishes.",,,,,,,0,,"fd.c",3448,
2023-09-21 08:40:37.831701 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progress",,,,,,,0,,"xlog.c",7385,
2023-09-21 08:40:37.831701 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progress",,,,,,,0,,"xlog.c",7385,
2023-09-21 08:40:37.974973 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","redo starts at 1C/4035D810",,,,,,,0,,"xlog.c",7674,
2023-09-21 08:40:37.980902 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","invalid record length at 1C/40918430: wanted 24, got 0",,,,,,,0,,"xlog.c",4482,
2023-09-21 08:40:37.980911 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","redo done at 1C/40918260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s",,,,,,,0,,"xlog.c",7962,
2023-09-21 08:40:38.265962 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 1C/40918430",,,,,,,0,,"xlog.c",8051,
2023-09-21 08:40:38.281916 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 1719112 and next transaction id is 1719113",,,,,,,0,,"xlog.c",8449,
2023-09-21 08:40:38.281968 CST,,,p434363,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",8476,
2023-09-21 08:40:38.285848 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit compiled on Aug 3 2023 10:15:10",,,,,,,0,,"postmaster.c",3538,
2023-09-21 08:40:38.285861 CST,,,p370720,th-891975552,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit compiled on Aug 3 2023 10:15:10",,,,,,0,,"postmaster.c",3540,
2023-09-21 08:40:38.291444 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","initialized 1 resource queues",,,,,,,0,,"resscheduler.c",267,
2023-09-21 08:40:38.362261 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","Crash recovery broadcast of the distributed transaction 'Commit Prepared' broadcast succeeded for gid = 2015839.",,,,,,,0,,"cdbdtxrecovery.c",84,
2023-09-21 08:40:38.364541 CST,,,p434368,th-891975552,,,,0,con1,,seg-1,,,,sx1,"LOG","00000","DTM Started",,,,,,,0,,"cdbdtxrecovery.c",155,
2023-09-21 08:40:38.368227 CST,,,p434374,th-891975552,,,,0,con4,,seg-1,,,,,"LOG","00000","pg_cron scheduler started",,,,,,,0,,"pg_cron.c",376,
2023-09-21 08:40:42.656687 CST,"gpadmin",,p434423,th-891975552,"","19002",2023-09-21 08:40:42 CST,0,,,seg-1,,,,sx1,"LOG","00000","rejecting TCP connection to master using internalconnection protocol",,,,,,,0,,"auth.c",570,
2023-09-21 08:40:42.697845 CST,"gpadmin",,p434423,th-891975552,"","19002",2023-09-21 08:40:42 CST,0,,,seg-1,,,,,"LOG","00000","standby ""gp_walreceiver"" is now a synchronous standby with priority 1",,,,,,"START_REPLICATION 1C/40000000 TIMELINE 1",0,,"syncrep.c",671,
What you think should happen instead
No response
How to reproduce
rerun the query.
Operating System
centos 7.9
Anything else
No response
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.
Just a reminder, this issue was marked as priority high by Max.
@liyxbeijing Could you provide the full step on repro? including creating table.
@liyxbeijing Max is paying attention to this issue. Would you please provide recreating steps?