Add an extract_equality_filters_from_query diagnostic function
DESCRIPTION: Add an extract_equality_filters_from_query diagnostic function
Per discussion with @saicitus , it would be useful to have a function to analyze filters and joins in queries in order to establish the relationships within the data model from queries.
This PR adds an extract_equality_filters_from_query function that takes a query string as input and emits:
a) relation ID + column number when the query contains a column = ? clause
b) left relation ID + left column number + right relation ID + right column number when the query contains a left.column = right.column join
We reuse some of the logic in Citus' planner hook to infer the filters, where possible across joins.
Example usage:
select e.* from extract_equality_filters_from_query('select * from test a left join ref b on (a.x = b.b ) where a.x = 1') e;
┌─────────────────┬────────────────┬──────────────────┬─────────────────┐
│ left_table_name │ left_column_id │ right_table_name │ right_column_id │
├─────────────────┼────────────────┼──────────────────┼─────────────────┤
│ test │ 1 │ │ │
│ ref │ 2 │ │ │
│ test │ 1 │ ref │ 2 │
└─────────────────┴────────────────┴──────────────────┴─────────────────┘
(3 rows)
Making it work with pg_stat_statements is a little challenging because pg_stat_statements replaces constants with untyped parameters. The parser can handle those parameters, but only if we supply the correct type, which is hard to know.
One approach is to replace all the parameters with NULL, which can be used instead of any type. extract_equality_filters_from_query replaces NULLs with values or function calls where possible to avoid filters from getting eliminated before we do our analysis.
select * from test a left join ref b on (a.x = b.b ) where a.x = 1;
select e.* from public.pg_stat_statements p, extract_equality_filters_from_query(regexp_replace(query, '\$[0-9]+', 'NULL', 'g')) e;
┌────────────────────┬────────────────┬────────────────────┬─────────────────┐
│ left_table_name │ left_column_id │ right_table_name │ right_column_id │
├────────────────────┼────────────────┼────────────────────┼─────────────────┤
│ test │ 1 │ │ │
│ ref │ 2 │ │ │
│ test │ 1 │ ref │ 2 │
└────────────────────┴────────────────┴────────────────────┴─────────────────┘
(3 rows)
The replacement may not always work, because the type of NULL cannot always be inferred:
select e.* from extract_equality_filters_from_query(regexp_replace('insert into test select s, s from generate_series($1,$2)', '\$[0-9]+', 'NULL', 'g')) e;
NOTICE: could not analyze query: function generate_series(unknown, unknown) is not unique
┌─────────────────┬────────────────┬──────────────────┬─────────────────┐
│ left_table_name │ left_column_id │ right_table_name │ right_column_id │
├─────────────────┼────────────────┼──────────────────┼─────────────────┤
└─────────────────┴────────────────┴──────────────────┴─────────────────┘
(0 rows)
One other limitation of pg_stat_statements to take into account: It does not include fully qualified names, so it may be necessary to change search_path before running extract_equality_filters_from_query.
Todo:
- [ ] Add tests
- [ ] Consider other ways of handling pg_stat_statements
@marcocitus not able to perform the below operation, is there something i missed?:
postgres=# insert into tmp select * from extract_equality_filters_from_query('select * from test where id=1'); ERROR: relcache reference tmp is not owned by resource owner TopTransaction
Also hitting a similar issue when I run the function against a table which has multiple queries: postgres=# SELECT e.* from queries,extract_equality_filters_from_query(regexp_replace(q,'$[0-9]+','NULL', 'g'))e limit 20; NOTICE: could not analyze query: syntax error at or near "as" NOTICE: could not analyze query: syntax error at or near "and" ERROR: buffer 3574 is not owned by resource owner TopTransaction
Whereas if the LIMIT is less than 20 it works
SELECT e.* from queries,extract_equality_filters_from_query(regexp_replace(q,'\$[0-9]+','NULL', 'g'))e limit 19;
NOTICE: could not analyze query: syntax error at or near "as"
NOTICE: could not analyze query: syntax error at or near "and"
left_table_name | left_column_id | right_table_name | right_column_id
-------------------------------+----------------+------------------+-----------------
xxxxxxxxxxxxx | 2 | |
xxxxxxxxxxx | 3 | zzzz | 1
*********_info | 1 | |
*********_info_12_2020 | 1 | |
*********_info_3_2021 | 1 | |
*********_info_4_2021 | 1 | |
*********_info_5_2021 | 1 | |
*********_info_6_2021 | 1 | |
yyyyy | 4 | |
yyyyy | 6 | |
yyyyyy | 2 | |
yyyy | 1 | |
zzzzzz | 2 | |
********* | 1 | |
*********_3_2021 | 1 | |
*********_4_2021 | 1 | |
*********_5_2021 | 1 | |
*********_6_2021 | 1 | |
zzzzz | 1 | |
(19 rows)