cloudberry
cloudberry copied to clipboard
TeardownTCPInterconnect issue when interconnect type set as TCP
Cloudberry Database version
postgres=# select version(); version
PostgreSQL 14.4 (Cloudberry Database 1.4.0 build commit:e83e3ffc22d538deb2dbceeeae0138ca2de064e6) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20191121 (Anolis 8.3.1-5.0.1.1) , 64-bit compiled on Sep 17 2023 23:10:22 (with assert checking) (1 row)
What happened
Operation will hang and we received below pstack information while we run interconnect type as TCP:
[gpadmin@datasharing-29-5 ~]$ pstack 497347
#0 0x00007fdf800df4ab in select () from /lib64/libc.so.6
#1 0x00007fdf6fc68fc5 in waitOnOutbound (pEntry=<optimized out>, pEntry=<optimized out>) at tcp/ic_tcp.c:2449
#2 0x00007fdf6fc6a57f in TeardownTCPInterconnect (transportStates=0x7e2dfd8, hasErrors=<optimized out>) at tcp/ic_tcp.c:2145
#3 0x00007fdf6fc6a705 in TeardownInterconnectTCP (transportStates=0x7e2dfd8, hasErrors=<optimized out>) at tcp/ic_tcp.c:2198
#4 0x0000000000a64ec6 in mppExecutorFinishup (queryDesc=queryDesc@entry=0x7e13f60) at execUtils.c:2011
#5 0x0000000000a530f6 in standard_ExecutorEnd (queryDesc=0x7e13f60) at execMain.c:1128
#6 0x0000000000cb0b37 in ProcessQuery (portal=portal@entry=0x7e952e0, stmt=stmt@entry=0x7ecd488, params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x7eee810, qc=qc@entry=0x7fff3b105ad0, sourceText=<optimized out>) at pquery.c:250
#7 0x0000000000cb105c in PortalRunMulti (portal=portal@entry=0x7e952e0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x7eee810, altdest=altdest@entry=0x7eee810, qc=qc@entry=0x7fff3b105ad0) at pquery.c:1460
#8 0x0000000000cb251f in PortalRun (portal=portal@entry=0x7e952e0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x7eee810, altdest=altdest@entry=0x7eee810, qc=0x7fff3b105ad0) at pquery.c:966
#9 0x0000000000cab69c in exec_mpp_query (query_string=0x7ecb4ff "insert into sqmdb_city.A_PM_5G_APP_ALARM(\r\n\t\t\tstarttime", ' ' <repeats 22 times>, ",\r\n\t\t\tprojectname", ' ' <repeats 20 times>, ",\r\n\t\t\tgnodeb_id", ' ' <repeats 22 times>, ",\r\n\t\t\tgnodeb_name_omc", ' ' <repeats 16 times>, ",\r\n\t\t\tnrcell"..., serializedPlantree=<optimized out>, serializedPlantreelen=<optimized out>, serializedQueryDispatchDesc=<optimized out>, serializedQueryDispatchDesclen=<optimized out>) at postgres.c:1413
#10 0x0000000000caf885 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7fff3b105fe0, dbname=<optimized out>, username=<optimized out>) at postgres.c:5736
#11 0x0000000000bf73f8 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:5035
#12 BackendStartup (port=<optimized out>) at postmaster.c:4739
#13 ServerLoop () at postmaster.c:2028
#14 0x0000000000bf857e in PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x7ddae30) at postmaster.c:1653
#15 0x00000000007c6196 in main (argc=7, argv=0x7ddae30) at main.c:269
[gpadmin@datasharing-29-5 ~]$
Similar issue at hashdata 3.x:
https://code.hashdata.xyz/hashdata/hashdata/-/issues/2589
What you think should happen instead
No response
How to reproduce
It is reproducible
Operating System
BELinux.
Anything else
No response
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.