RFC: IN list compaction for normalized_query
We noticed that many applications generate variable-sized lists for bulk modification of result sets which can lead to many separate normalized_queries although they belong to the same statement (class).
DELETE FROM "post_taggings" WHERE post_id = ? AND tag_id IN (?)
DELETE FROM "post_taggings" WHERE post_id = ? AND tag_id IN (?,?)
DELETE FROM "post_taggings" WHERE post_id = ? AND tag_id IN (?,?,?)
DELETE FROM "post_taggings" WHERE post_id = ? AND tag_id IN (?,?,?,?)
[..31 total variations of this query were recorded..]
A quick test showed that this seems to be related to the fact that the cases "has one list member" and "has many list members" are hashed to different plans though the normalized_query is recorded for the exact number of elements the first "has many list members" query used. This causes problems on databases where pg_stat_plans are reset on a regular basis.
plantest=# select normalized_query, calls from pg_stat_plans_queries where dbid=152414 and normalized_query like 'select id%';
normalized_query | calls
------------------+-------
(0 rows)
plantest=# select id from foo where id in (10, 100, 1000, 10000, 10000, 43, 543, 62, 745, 124);
id
-------
10
43
62
100
124
543
745
1000
10000
(9 rows)
plantest=# select id from foo where id in (10, 100);
id
-----
10
100
(2 rows)
plantest=# select id from foo where id in (10, 100, 1000);
id
------
10
100
1000
(3 rows)
plantest=# select id from foo where id in (10, 100);
id
-----
10
100
(2 rows)
plantest=# select id from foo where id in (10, 100, 1000);
id
------
10
100
1000
(3 rows)
plantest=# select id from foo where id in (10);
id
----
10
(1 row)
plantest=# select normalized_query, calls from pg_stat_plans_queries where dbid=152414 and normalized_query like 'select id%';
normalized_query | calls
----------------------------------------------------------------+-------
select id from foo where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?); | 5
select id from foo where id in (?); | 1
(2 rows)
plantest=# select pg_stat_plans_reset();
pg_stat_plans_reset
---------------------
(1 row)
plantest=# select normalized_query, calls from pg_stat_plans_queries where dbid=152414 and normalized_query like 'select id%';
normalized_query | calls
------------------+-------
(0 rows)
plantest=# select id from foo where id in (10, 100);
id
-----
10
100
(2 rows)
plantest=# select id from foo where id in (10, 100, 1000);
id
------
10
100
1000
(3 rows)
plantest=# select id from foo where id in (10, 100, 1000, 10000);
id
-------
10
100
1000
10000
(4 rows)
plantest=# select id from foo where id in (10, 100, 1000, 10000, 10000, 43, 543, 62, 745, 124);
id
-------
10
43
62
100
124
543
745
1000
10000
(9 rows)
plantest=# select id from foo where id in (10);
id
----
10
(1 row)
plantest=# select id from foo where id in (10);
id
----
10
(1 row)
plantest=# select normalized_query, calls from pg_stat_plans_queries where dbid=152414 and normalized_query like 'select id%';
normalized_query | calls
----------------------------------------+-------
select id from foo where id in (?, ?); | 4
select id from foo where id in (?); | 2
(2 rows)
plantest=#
Would it be feasible to compact lists with more than two placeholders to just two in the normalized_query?
If not, it'd be a good if we could add this to a "Limitations" section in the documentation.