citus icon indicating copy to clipboard operation
citus copied to clipboard

Citus Custom Query Planner is not working like PostgreSQL Query Planner with UNION ALL at all

Open arsi opened this issue 1 year ago • 1 comments

I have a view which has 26 table size UNION ALL (so query is just a SELECT FROM x-1 UNION ALL SELECT FROM x-2 ...)

When I use native PostgreSQL table it uses parallel seq scan. When I have distributed it a with key as distributed table it does 27 Distributed Subplans but runs them in serial. Why can't it work in parallel as well? Using schema-based distribution seem to work (as it is using native query planner) but I would really like to use row-based.

Using: Citus 12.0.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit and PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit

PostgreSQL Query planner:

mydata=# explain analyze select * from x; QUERY PLAN

Gather (cost=1000.00..532907.96 rows=674493 width=2844) (actual time=11.944..10186.177 rows=668574 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Append (cost=0.00..464458.66 rows=134897 width=2846) (actual time=1.116..9351.628 rows=111429 loops=6) -> Parallel Seq Scan on "x-18" (cost=0.00..42767.12 rows=23663 width=2866) (actual time=0.786..6798.925 rows=96043 loops=1) -> Parallel Seq Scan on "x-22" (cost=0.00..37324.45 rows=18384 width=2815) (actual time=1.268..6218.198 rows=73579 loops=1) -> Parallel Seq Scan on "x-19" (cost=0.00..35700.12 rows=23932 width=2856) (actual time=0.057..4686.369 rows=66811 loops=1) -> Parallel Seq Scan on "x-17" (cost=0.00..27775.92 rows=14942 width=2814) (actual time=0.896..3102.783 rows=46319 loops=1) -> Parallel Seq Scan on "x-25" (cost=0.00..27124.91 rows=13891 width=2807) (actual time=3.139..5775.030 rows=43061 loops=1) -> Parallel Seq Scan on "x-5" (cost=0.00..26901.59 rows=12440 width=2907) (actual time=2.006..1719.050 rows=38565 loops=1) -> Parallel Seq Scan on "x-16" (cost=0.00..26202.44 rows=13448 width=2862) (actual time=2.410..3454.751 rows=41689 loops=1) -> Parallel Seq Scan on "x-11" (cost=0.00..24990.08 rows=11799 width=2887) (actual time=4.916..2653.725 rows=36576 loops=1) -> Parallel Seq Scan on "x-23" (cost=0.00..23816.53 rows=10231 width=2813) (actual time=0.548..1665.396 rows=15858 loops=2) -> Parallel Seq Scan on "x-21" (cost=0.00..22987.02 rows=9492 width=2854) (actual time=1.079..707.796 rows=4909 loops=6) -> Parallel Seq Scan on "x-20" (cost=0.00..21928.22 rows=8772 width=2849) (actual time=0.465..997.019 rows=13597 loops=2) -> Parallel Seq Scan on "x-15" (cost=0.00..20817.79 rows=8052 width=2859) (actual time=1.134..1168.647 rows=8320 loops=3) -> Parallel Seq Scan on "x-9" (cost=0.00..18780.08 rows=6071 width=2882) (actual time=0.959..1002.836 rows=18819 loops=1) -> Parallel Seq Scan on "x-26" (cost=0.00..18322.80 rows=6223 width=2794) (actual time=0.656..2637.623 rows=19292 loops=1) -> Parallel Seq Scan on "x-6" (cost=0.00..10806.16 rows=2985 width=2911) (actual time=0.956..419.030 rows=9254 loops=1) -> Parallel Seq Scan on "x-4" (cost=0.00..10573.15 rows=2980 width=2734) (actual time=1.250..736.876 rows=9238 loops=1) -> Parallel Seq Scan on "x-1" (cost=0.00..9289.09 rows=2115 width=2790) (actual time=1.249..290.801 rows=5076 loops=1) -> Parallel Seq Scan on "x-8" (cost=0.00..9137.91 rows=2565 width=2881) (actual time=1.897..330.567 rows=6156 loops=1) -> Parallel Seq Scan on "x-13" (cost=0.00..8737.35 rows=3632 width=2846) (actual time=0.530..558.622 rows=8717 loops=1) -> Parallel Seq Scan on "x-2" (cost=0.00..8004.51 rows=2143 width=2760) (actual time=4.306..362.954 rows=5144 loops=1) -> Parallel Seq Scan on "x-7" (cost=0.00..7129.47 rows=1971 width=2894) (actual time=1.694..240.027 rows=4730 loops=1) -> Parallel Seq Scan on "x-14" (cost=0.00..6679.42 rows=2847 width=2853) (actual time=0.391..535.305 rows=6833 loops=1) -> Parallel Seq Scan on "x-3" (cost=0.00..5843.50 rows=1667 width=2759) (actual time=1.346..292.045 rows=4000 loops=1) -> Parallel Seq Scan on "x-24" (cost=0.00..5529.11 rows=3676 width=2827) (actual time=0.681..818.353 rows=8822 loops=1) -> Parallel Seq Scan on "x-12" (cost=0.00..4616.69 rows=1503 width=2864) (actual time=0.039..180.761 rows=3607 loops=1) -> Parallel Seq Scan on "x-10" (cost=0.00..1998.75 rows=1717 width=2883) (actual time=0.546..147.134 rows=2919 loops=1) Planning Time: 101.069 ms Execution Time: 10213.934 ms (32 rows)

Citus Custom Scan Query Planner:

mydata=> explain analyze select * from x; QUERY PLAN

Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=25271.495..25961.113 rows=668574 loops=1) -> Distributed Subplan 1_1 Subplan Duration: 410.39 ms Intermediate Data Size: 16 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=4022) (actual time=136.003..136.827 rows=5076 loops=1) Task Count: 32 Tuple data received from nodes: 13 MB Tasks Shown: One of 32 -> Task Tuple data received from node: 3149 kB Node: host=1.1.1.1 port=5432 dbname=mydata -> Seq Scan on "x-1_102022" "x-1" (cost=0.00..663.46 rows=1158 width=2804) (actual time=0.088..58.356 rows=1158 loops=1) Planning Time: 0.165 ms Execution Time: 63.909 ms Planning Time: 0.000 ms Execution Time: 137.427 ms -> Distributed Subplan 1_2 Subplan Duration: 198.66 ms Intermediate Data Size: 16 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=4022) (actual time=142.401..143.229 rows=5144 loops=1) Task Count: 32 Tuple data received from nodes: 13 MB Tasks Shown: One of 32 -> Task Tuple data received from node: 3520 kB Node: host=1.1.1.1 port=5432 dbname=mydata -> Seq Scan on "x-2_102054" "x-2" (cost=0.00..733.21 rows=1292 width=2777) (actual time=0.079..63.338 rows=1292 loops=1) Planning Time: 0.134 ms Execution Time: 68.751 ms Planning Time: 0.000 ms Execution Time: 143.899 ms -> Distributed Subplan 1_3 ... Distributed Subplan 1_26 -> Distributed Subplan 1_27 Subplan Duration: 24442.69 ms Intermediate Data Size: 2362 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=17673.567..18368.563 rows=668574 loops=1) Task Count: 1 Tuple data received from nodes: 1970 MB Tasks Shown: All -> Task Tuple data received from node: 1970 MB Node: host=localhost port=5432 dbname=mydata -> Gather (cost=1000.00..105762.91 rows=26000 width=4022) (actual time=50.153..1047.263 rows=668574 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Append (cost=0.00..102162.91 rows=106829 width=4022) (actual time=764.116..1590.658 rows=111429 loops=6) -> Function Scan on read_intermediate_result intermediate_result_17 (cost=0.00..74263.04 rows=78302 width=4022) (actual time=1316.678..1433.456 rows=96043 loops=1) -> Function Scan on read_intermediate_result intermediate_result_21 (cost=0.00..57176.61 rows=60286 width=4022) (actual time=1031.296..1119.719 rows=73579 loops=1) -> Function Scan on read_intermediate_result intermediate_result_18 (cost=0.00..52058.16 rows=54889 width=4022) (actual time=942.834..1021.694 rows=66811 loops=1) -> Function Scan on read_intermediate_result intermediate_result_16 (cost=0.00..35268.77 rows=37187 width=4022) (actual time=648.570..702.491 rows=46319 loops=1) -> Function Scan on read_intermediate_result intermediate_result_24 (cost=0.00..33186.20 rows=34991 width=4022) (actual time=603.903..653.121 rows=43061 loops=1) -> Function Scan on read_intermediate_result intermediate_result_15 (cost=0.00..32206.30 rows=33958 width=4022) (actual time=534.989..585.173 rows=41689 loops=1) -> Function Scan on read_intermediate_result intermediate_result_10 (cost=0.00..27414.12 rows=28905 width=4022) (actual time=434.296..478.152 rows=36576 loops=1) -> Function Scan on read_intermediate_result intermediate_result_4 (cost=0.00..27145.38 rows=28622 width=4022) (actual time=433.297..476.888 rows=38565 loops=1) -> Function Scan on read_intermediate_result intermediate_result_22 (cost=0.00..24244.07 rows=25563 width=4022) (actual time=412.542..450.160 rows=31717 loops=1) -> Function Scan on read_intermediate_result intermediate_result_20 (cost=0.00..22690.84 rows=23925 width=4022) (actual time=363.916..398.711 rows=29452 loops=1) -> Function Scan on read_intermediate_result intermediate_result_19 (cost=0.00..21044.75 rows=22189 width=4022) (actual time=336.200..367.755 rows=27194 loops=1) -> Function Scan on read_intermediate_result intermediate_result_14 (cost=0.00..19159.83 rows=20202 width=4022) (actual time=308.422..337.299 rows=24961 loops=1) -> Function Scan on read_intermediate_result intermediate_result_25 (cost=0.00..14138.91 rows=14908 width=4022) (actual time=234.564..255.000 rows=19292 loops=1) -> Function Scan on read_intermediate_result intermediate_result_8 (cost=0.00..13473.64 rows=14206 width=4022) (actual time=220.137..239.941 rows=18819 loops=1) -> Function Scan on read_intermediate_result intermediate_result_12 (cost=0.00..6542.35 rows=6898 width=4022) (actual time=92.148..94.095 rows=8717 loops=1) -> Function Scan on read_intermediate_result intermediate_result_5 (cost=0.00..6530.84 rows=6886 width=4022) (actual time=93.769..95.633 rows=9254 loops=1) -> Function Scan on read_intermediate_result intermediate_result_3 (cost=0.00..6399.14 rows=6747 width=4022) (actual time=120.001..121.627 rows=9238 loops=1) -> Function Scan on read_intermediate_result intermediate_result_23 (cost=0.00..6373.47 rows=6720 width=4022) (actual time=132.529..133.768 rows=8822 loops=1) -> Function Scan on read_intermediate_result intermediate_result_13 (cost=0.00..5146.78 rows=5427 width=4022) (actual time=84.388..85.403 rows=6833 loops=1) -> Function Scan on read_intermediate_result intermediate_result_7 (cost=0.00..4354.01 rows=4591 width=4022) (actual time=74.876..75.668 rows=6156 loops=1) -> Function Scan on read_intermediate_result intermediate_result_1 (cost=0.00..3524.35 rows=3716 width=4022) (actual time=68.978..69.630 rows=5144 loops=1) -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..3433.25 rows=3620 width=4022) (actual time=72.051..72.704 rows=5076 loops=1) -> Function Scan on read_intermediate_result intermediate_result_6 (cost=0.00..3313.55 rows=3494 width=4022) (actual time=61.486..62.134 rows=4730 loops=1) -> Function Scan on read_intermediate_result intermediate_result_2 (cost=0.00..2757.53 rows=2908 width=4022) (actual time=61.316..62.016 rows=4000 loops=1) -> Function Scan on read_intermediate_result intermediate_result_11 (cost=0.00..2635.48 rows=2779 width=4022) (actual time=52.626..53.165 rows=3607 loops=1) -> Function Scan on read_intermediate_result intermediate_result_9 (cost=0.00..2114.15 rows=2229 width=4022) (actual time=41.404..41.822 rows=2919 loops=1) Planning Time: 8.388 ms Execution Time: 2547.809 ms Planning Time: 0.000 ms Execution Time: 18587.141 ms Task Count: 1 Tuple data received from nodes: 1970 MB Tasks Shown: All -> Task Tuple data received from node: 1970 MB Node: host=localhost port=5432 dbname=mydata -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..506595.55 rows=534148 width=4022) (actual time=7892.106..8627.884 rows=668574 loops=1) Planning Time: 0.188 ms Execution Time: 9965.010 ms Planning Time: 366.923 ms Execution Time: 78025.057 ms (476 rows)

arsi avatar Sep 01 '23 11:09 arsi

Are you sure they're run serially? There seems to be some overlap in the time ranges in the Citus result, albeit much less...

DS-AdamMilazzo avatar Sep 29 '23 16:09 DS-AdamMilazzo