citus icon indicating copy to clipboard operation
citus copied to clipboard

Add an extract_equality_filters_from_query diagnostic function

Open marcocitus opened this issue 4 years ago • 1 comments

DESCRIPTION: Add an extract_equality_filters_from_query diagnostic function

Per discussion with @saicitus , it would be useful to have a function to analyze filters and joins in queries in order to establish the relationships within the data model from queries.

This PR adds an extract_equality_filters_from_query function that takes a query string as input and emits: a) relation ID + column number when the query contains a column = ? clause b) left relation ID + left column number + right relation ID + right column number when the query contains a left.column = right.column join

We reuse some of the logic in Citus' planner hook to infer the filters, where possible across joins.

Example usage:

 select e.* from extract_equality_filters_from_query('select * from test a left join ref b on (a.x = b.b ) where a.x = 1') e;
┌─────────────────┬────────────────┬──────────────────┬─────────────────┐
│ left_table_name │ left_column_id │ right_table_name │ right_column_id │
├─────────────────┼────────────────┼──────────────────┼─────────────────┤
│ test            │              1 │                  │                 │
│ ref             │              2 │                  │                 │
│ test            │              1 │ ref              │               2 │
└─────────────────┴────────────────┴──────────────────┴─────────────────┘
(3 rows)

Making it work with pg_stat_statements is a little challenging because pg_stat_statements replaces constants with untyped parameters. The parser can handle those parameters, but only if we supply the correct type, which is hard to know.

One approach is to replace all the parameters with NULL, which can be used instead of any type. extract_equality_filters_from_query replaces NULLs with values or function calls where possible to avoid filters from getting eliminated before we do our analysis.

select * from test a left join ref b on (a.x = b.b ) where a.x = 1;
select e.* from public.pg_stat_statements p, extract_equality_filters_from_query(regexp_replace(query, '\$[0-9]+', 'NULL', 'g')) e;
┌────────────────────┬────────────────┬────────────────────┬─────────────────┐
│  left_table_name   │ left_column_id │  right_table_name  │ right_column_id │
├────────────────────┼────────────────┼────────────────────┼─────────────────┤
│ test               │              1 │                    │                 │
│ ref                │              2 │                    │                 │
│ test               │              1 │ ref                │               2 │
└────────────────────┴────────────────┴────────────────────┴─────────────────┘
(3 rows)

The replacement may not always work, because the type of NULL cannot always be inferred:

select e.* from extract_equality_filters_from_query(regexp_replace('insert into test select s, s from generate_series($1,$2)', '\$[0-9]+', 'NULL', 'g')) e;
NOTICE:  could not analyze query: function generate_series(unknown, unknown) is not unique
┌─────────────────┬────────────────┬──────────────────┬─────────────────┐
│ left_table_name │ left_column_id │ right_table_name │ right_column_id │
├─────────────────┼────────────────┼──────────────────┼─────────────────┤
└─────────────────┴────────────────┴──────────────────┴─────────────────┘
(0 rows)

One other limitation of pg_stat_statements to take into account: It does not include fully qualified names, so it may be necessary to change search_path before running extract_equality_filters_from_query.

Todo:

  • [ ] Add tests
  • [ ] Consider other ways of handling pg_stat_statements

marcocitus avatar Feb 23 '22 13:02 marcocitus

@marcocitus not able to perform the below operation, is there something i missed?:

postgres=# insert into tmp select * from extract_equality_filters_from_query('select * from test where id=1'); ERROR: relcache reference tmp is not owned by resource owner TopTransaction

Also hitting a similar issue when I run the function against a table which has multiple queries: postgres=# SELECT e.* from queries,extract_equality_filters_from_query(regexp_replace(q,'$[0-9]+','NULL', 'g'))e limit 20; NOTICE: could not analyze query: syntax error at or near "as" NOTICE: could not analyze query: syntax error at or near "and" ERROR: buffer 3574 is not owned by resource owner TopTransaction

Whereas if the LIMIT is less than 20 it works

SELECT e.* from queries,extract_equality_filters_from_query(regexp_replace(q,'\$[0-9]+','NULL', 'g'))e limit 19;
NOTICE:  could not analyze query: syntax error at or near "as"
NOTICE:  could not analyze query: syntax error at or near "and"
        left_table_name        | left_column_id | right_table_name | right_column_id
-------------------------------+----------------+------------------+-----------------
 xxxxxxxxxxxxx                   |              2 |                  |
 xxxxxxxxxxx                     |              3 | zzzz              |               1
 *********_info                  |              1 |                  |
 *********_info_12_2020          |              1 |                  |
 *********_info_3_2021           |              1 |                  |
 *********_info_4_2021           |              1 |                  |
 *********_info_5_2021           |              1 |                  |
 *********_info_6_2021           |              1 |                  |
 yyyyy  |              4 |                  |
 yyyyy        |              6 |                  |
 yyyyyy           |              2 |                  |
 yyyy                         |              1 |                  |
 zzzzzz |              2 |                  |
 *********                       |              1 |                  |
 *********_3_2021                |              1 |                  |
 *********_4_2021                |              1 |                  |
 *********_5_2021                |              1 |                  |
 *********_6_2021                |              1 |                  |
 zzzzz                    |              1 |                  |
(19 rows)

saicitus avatar Feb 25 '22 00:02 saicitus