[Perf]: Planning time increases linearly with number of stripes
What's wrong?
I noticed this because I have two tables with the same data, but one was built with the default 150k rows per stripe and the other with 10k rows per stripe. The table has about 1 billion rows. Both tables have a BTree index on vid.
This is an explain analyze for the table with 150k rows per stripe:
explain analyze select s.amount_in from swap_columnar_vid s where vid in (899171129, 999171129);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_swap_columnar_vid on swap_columnar_vid s (cost=0.57..1945.61 rows=2 width=10) (actual time=21.079..21.093 rows=1 loops=1)
Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
Planning Time: 11.635 ms
Execution Time: 21.899 ms
(4 rows)
And now for the table with 10k rows per stripe:
d30750a14969=> explain analyze select s.amount_in from swap_columnar_vid_stripe_10k s where vid in (899171129, 999171129);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
------------
Index Scan using idx_vid_on_swap_columnar_vid_stripe_10k on swap_columnar_vid_stripe_10k s (cost=0.57..142.02 rows=2 width=10) (actual time=6.422..6.435 rows
=1 loops=1)
Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
Planning Time: 142.906 ms
Execution Time: 6.474 ms
(4 rows)
Notice how the query planning increased proportionally to the increase in number of stripes, and to an unreasonable 140ms. And the plan isn't even a columnar scan. This can be worked around to some extent by using prepared statements to take advantage of plan caching, but still this is a performance issue.
Hi @leoyvens thank you for report.
Lowering stripe max row count will increase stripes for table and thus related metadata. Part of planning is to read table stripe metadata to calculate cost for reading columnar tables and there is where difference is.
Although it looks significant 11ms vs. 142ms, both times are still milliseconds.
We'll discuss if this is something that is on our priority to work on - so marking this as enhancement rather than bug.
Instead of reading the metadata table directly, which seems unusually expensive for a planning step, perhaps the stripe count and row count could be kept as statistics, that is, updated only on VACUUM or ANALYZE.
So we can understand your use case better, what are you trying to accomplish by making the maximum stripe size smaller? Did doing so have the desired effect (aside from the increased planning time)?
I am looking to improve the performance of index scans. As you can also see in the OP, the smaller stripe size resulted in a much faster execution time. In my measurements I saw an average speed up of 5x in index scans when using a 10k stripe size instead of the default 150k.
Taking a step back, if I'm fetching a single column of a single row using a btree index, there is no fundamental reason why columnar should perform worse than a heap table. At default configurations, heap performs better because its page size is much smaller than the default columnar stripe size, but that's a tuneable parameter and not a fundamental limitation of columnar.
In fact, according to the benchmarks I ran, if this issue and the column overfetching issue discussed in #187 are addressed, columnar index scans at a 10k stripe size should become competitive with heap index scans.