Performance improvements for postgres query with a large array parameter
We have a performance issue with a query that runs against large table
query GetSecurities($isin: [String!]) {
debtsecurities(
where: {
ISIN: {_in: $isin}
}
)
{
CUSIP
ISIN
Ticker
}
}
All database indexes are in place and data is in shared memory. The database we use is postgres 14.x and we use hasura/graphql-engine:v2.41.0.cli-migrations-v2 docker image
The performance issue appears when $isin parameter has 10k+ or more items
SELECT
coalesce(json_agg("root"), '[]') AS "root"
FROM
(
SELECT
row_to_json(
(
SELECT
"_e"
FROM
(
SELECT
"_root.base"."CUSIP" AS "CUSIP",
"_root.base"."ISIN" AS "ISIN",
"_root.base"."Ticker" AS "Ticker"
) AS "_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."Securities"
WHERE
(
("public"."Securities"."ISIN") = ANY(
(
'{"isin1",...,"isin10K+"}'
) :: varchar []
)
)
) AS "_root.base"
) AS "_root"
so the query runs for 2+ minutes
if the code inside ANY is replaced from array to a list of values like in the query below the results comes in subsecond
SELECT
coalesce(json_agg("root"), '[]') AS "root"
FROM
(
SELECT
row_to_json(
(
SELECT
"_e"
FROM
(
SELECT
"_root.base"."CUSIP" AS "CUSIP",
"_root.base"."ISIN" AS "ISIN",
"_root.base"."Ticker" AS "Ticker"
) AS "_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."Securities"
WHERE
(
("public"."Securities"."ISIN") = ANY(
(
VALUES('isin1'), ('...'), ('isin10K')
)
)
) AS "_root.base"
) AS "_root"
The solution is described in the article here https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/
It would be cool to incorporate this into harusa connector
Thank you for the report. We intend on to benchmark these two approaches together with the new approach we use in ndc-postgres and get a better understanding on how to improve the situation for v3.
For v2, we don't plan to prioritize this at the moment, however it is likely that this problem can be mitigated using Native Queries if necessary.