[Bug] Run sql failed,report server closed the connection unexpectedly
Apache Cloudberry version
2.0.0-incubating-rc2
What happened
table structure:
create table reference_info
(
uid text,
country text,
ref_uid text,
ref_country text,
ref_year varchar(32)
) with (appendoptimized = true, orientation = column, compresstype = zstd,compresslevel = 5) distributed by (uid);
There are 20 million data entries in the database table.
I run sql:
select t.country, t.ref_year, ref_quantity, ref_frequency
from (select country, ref_year, count() as ref_frequency from reference_info group by country, ref_year) t
join
(select country, ref_year, count() as ref_quantity
from (select country, ref_year, ref_uid from reference_info group by country, ref_year, ref_uid) t
group by country, ref_year) u
on t.country = u.country and t.ref_year = u.ref_year;
or
select ref_country, ref_year, count(distinct ref_uid) as cited_quantity, count() as cited_frequency
from reference_info
where country = 'USA'
group by ref_country, ref_year;
is all worked.But
select country, ref_year, count() as ref_frequency, count(distinct ref_uid) as ref_quantity
from reference_info
group by country, ref_year;
is failed ,log is:
2025-07-22 21:37:15.905016 EDT,"gpadmin","db",p4122615,th294045888,"192.168.23.218","33828",2025-07-22 21:27:15 EDT,0,con3154,cmd2,seg-1,,dx1344140,,sx1,"ERROR","58M01","Error on receive from seg7 slice1 192.168.18.164:6007 pid=4123699: server closed the connection unexpectedly This probably means the server terminated abnormally
before or while processing the request.",,,,,,,0,,"cdbdispatchresult.c",510,"Stack trace:
1 0x7fab13df44a8 libpostgres.so errstart + 0x258
2 0x7fab13dfa3f6 libpostgres.so ThrowErrorData + 0x16
3 0x7fab13a4919f libpostgres.so mppExecutorFinishup + 0xdf
4 0x7fab13a37b2e libpostgres.so standard_ExecutorEnd + 0xee
5 0x7fab139cc629 libpostgres.so PortalCleanup + 0x79
6 0x7fab13e3973b libpostgres.so PortalDrop + 0x3b
7 0x7fab13e39de5 libpostgres.so PreCommit_Portals + 0x125
8 0x7fab13889e5c libpostgres.so
What you think should happen instead
No response
How to reproduce
select country, ref_year, count(*) as ref_frequency, count(distinct ref_uid) as ref_quantity from reference_info group by country, ref_year;
Operating System
rocky 9.6
Anything else
No response
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct.
Hi, @cleverxiao001 welcome!🎊 Thanks for taking the time to point this out.🙌
is all worked.But select country, ref_year, count() as ref_frequency, count(distinct ref_uid) as ref_quantity from reference_info group by country, ref_year; is failed ,log is:
Hi, I create some random data, but couldn't reproduce that. The log is on master, so there is no much info. Could you try to explain the SQL to see if it will crash? and show the plan if no crash
explain(verbose)
explain(verbose) select country, ref_year, count() as ref_frequency, count(distinct ref_uid) as ref_quantity from reference_info group by country, ref_year;
and explain(analyze)
explain(verbose, analyze) select country, ref_year, count() as ref_frequency, count(distinct ref_uid) as ref_quantity from reference_info group by country, ref_year;
explain(verbose) output: QUERY PLAN Gather Motion 36:1 (slice1; segments: 36) (cost=0.00..431.00 rows=1 width=32) " Output: country, ref_year, (count()), (count(DISTINCT ref_uid))" -> GroupAggregate (cost=0.00..431.00 rows=1 width=32) " Output: country, ref_year, count(), count(DISTINCT ref_uid)" " Group Key: reference_info.country, reference_info.ref_year" -> Sort (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" " Sort Key: reference_info.country, reference_info.ref_year" -> Redistribute Motion 36:36 (slice2; segments: 36) (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" " Hash Key: country, ref_year" -> Seq Scan on reference_info (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" Optimizer: GPORCA
explain(verbose,analyze) output: QUERY PLAN Gather Motion 36:1 (slice1; segments: 36) (cost=0.00..431.00 rows=1 width=32) (actual time=72085.217..115653.745 rows=2480 loops=1) " Output: country, ref_year, (count()), (count(DISTINCT ref_uid))" -> GroupAggregate (cost=0.00..431.00 rows=1 width=32) (actual time=69972.797..88910.012 rows=92 loops=1) " Output: country, ref_year, count(), count(DISTINCT ref_uid)" " Group Key: reference_info.country, reference_info.ref_year" -> Sort (cost=0.00..431.00 rows=1 width=24) (actual time=69832.795..71977.819 rows=8223755 loops=1) " Output: country, ref_uid, ref_year" " Sort Key: reference_info.country, reference_info.ref_year" Sort Method: quicksort Memory: 122481kB Max Memory: 62527kB Avg Memory: 61240kB (2 segments) Sort Method: external merge Disk: 3588512kB Max Memory: 377088kB Avg Memory: 105544kB (34 segments) work_mem: 2236160kB Segments: 36 Max: 63854kB (segment 1) Workfile: (34 spilling) -> Redistribute Motion 36:36 (slice2; segments: 36) (cost=0.00..431.00 rows=1 width=24) (actual time=3.000..66710.760 rows=8223755 loops=1) " Output: country, ref_uid, ref_year" " Hash Key: country, ref_year" -> Seq Scan on reference_info (cost=0.00..431.00 rows=1 width=24) (actual time=5.000..2059.006 rows=2290316 loops=1) " Output: country, ref_uid, ref_year" Planning Time: 13.835 ms (slice0) Executor memory: 117K bytes. " (slice1) Executor memory: 106412K bytes avg x 36x(0) workers, 141357K bytes max (seg23). Work_mem: 63854K bytes max." " (slice2) Executor memory: 530K bytes avg x 36x(0) workers, 530K bytes max (seg0)." Memory used: 128000kB Optimizer: GPORCA Execution Time: 115751.813 ms
Both of them is worked
explain(verbose) output: QUERY PLAN Gather Motion 36:1 (slice1; segments: 36) (cost=0.00..431.00 rows=1 width=32) " Output: country, ref_year, (count()), (count(DISTINCT ref_uid))" -> GroupAggregate (cost=0.00..431.00 rows=1 width=32) " Output: country, ref_year, count(), count(DISTINCT ref_uid)" " Group Key: reference_info.country, reference_info.ref_year" -> Sort (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" " Sort Key: reference_info.country, reference_info.ref_year" -> Redistribute Motion 36:36 (slice2; segments: 36) (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" " Hash Key: country, ref_year" -> Seq Scan on reference_info (cost=0.00..431.00 rows=1 width=24) " Output: country, ref_uid, ref_year" Optimizer: GPORCA
explain(verbose,analyze) output: QUERY PLAN Gather Motion 36:1 (slice1; segments: 36) (cost=0.00..431.00 rows=1 width=32) (actual time=72085.217..115653.745 rows=2480 loops=1) " Output: country, ref_year, (count()), (count(DISTINCT ref_uid))" -> GroupAggregate (cost=0.00..431.00 rows=1 width=32) (actual time=69972.797..88910.012 rows=92 loops=1) " Output: country, ref_year, count(), count(DISTINCT ref_uid)" " Group Key: reference_info.country, reference_info.ref_year" -> Sort (cost=0.00..431.00 rows=1 width=24) (actual time=69832.795..71977.819 rows=8223755 loops=1) " Output: country, ref_uid, ref_year" " Sort Key: reference_info.country, reference_info.ref_year" Sort Method: quicksort Memory: 122481kB Max Memory: 62527kB Avg Memory: 61240kB (2 segments) Sort Method: external merge Disk: 3588512kB Max Memory: 377088kB Avg Memory: 105544kB (34 segments) work_mem: 2236160kB Segments: 36 Max: 63854kB (segment 1) Workfile: (34 spilling) -> Redistribute Motion 36:36 (slice2; segments: 36) (cost=0.00..431.00 rows=1 width=24) (actual time=3.000..66710.760 rows=8223755 loops=1) " Output: country, ref_uid, ref_year" " Hash Key: country, ref_year" -> Seq Scan on reference_info (cost=0.00..431.00 rows=1 width=24) (actual time=5.000..2059.006 rows=2290316 loops=1) " Output: country, ref_uid, ref_year" Planning Time: 13.835 ms (slice0) Executor memory: 117K bytes. " (slice1) Executor memory: 106412K bytes avg x 36x(0) workers, 141357K bytes max (seg23). Work_mem: 63854K bytes max." " (slice2) Executor memory: 530K bytes avg x 36x(0) workers, 530K bytes max (seg0)." Memory used: 128000kB Optimizer: GPORCA Execution Time: 115751.813 ms
Both of them is worked
Thanks!
@my-ship-it
@avamingli, is it a ORCA issue? Could we have please help investigate, thanks!
@cleverxiao001 Could you search any error messages in logs for seg7?
error log in seg7:
2025-07-25 00:53:59.841934 EDT,"gpadmin","postgres",p268864,th1103079616,"168.160.18.164","59040",2025-07-25 00:53:59 EDT,0,con3,,seg7,,,,sx1,"LOG","00000","statement: select gid from pg_prepared_xacts where prepared < now() - interval'120 seconds'",,,,,,,0,,"postgres.c",1713,
2025-07-25 00:54:11.632398 EDT,,,p268891,th1103079616,"168.160.18.164","23646",2025-07-25 00:54:11 EDT,0,,,seg7,,,,,"LOG","00000","query finish request to process 268743",,,,,,,0,,"postmaster.c",2932,
2025-07-25 00:54:12.491000 EDT,,,p268743,th123456,,,2025-07-25 00:52:58 EDT,0,con3202,cmd2,seg7,slice1,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 0x7fc543ffa92c libpostgres.so gp_backtrace + 0x3c
2 0x7fc543ffabd3 libpostgres.so StandardHandlerForSigillSigsegvSigbus_OnMainThread + 0x153
3 0x7fc54303ebf0 libc.so.6
It seems that the error is related to your data. It's abnormal that the detoast occurred in the sort slice.
All data is imported by using Spark to call the method: copy tableName from stdin with (format csv, NULL '\N'). I wonder if it has anything to do with the format of the CSV.
Maybe this is a layout issue. Are all segments in a machine ? Could you provide all segment system info, moreover system info for compile postgres bin. @cleverxiao001
The cluster consists of two servers in total. The segments are evenly distributed across the two servers. All hard drives are installed on one of the servers, managed via LVM and divided into multiple LV. One of these LV is mounted on the other server through NFS