Performance improvements for postgres query with a large array parameter

Open SlavaChristin opened this issue 1 year ago • 1 comments

We have a performance issue with a query that runs against large table

query GetSecurities($isin: [String!]) {
    debtsecurities(
        where: {
            ISIN: {_in: $isin}
        }
    )
    {
        CUSIP
        ISIN
        Ticker
    }
}

All database indexes are in place and data is in shared memory. The database we use is postgres 14.x and we use hasura/graphql-engine:v2.41.0.cli-migrations-v2 docker image

The performance issue appears when $isin parameter has 10k+ or more items

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."CUSIP" AS "CUSIP",
                "_root.base"."ISIN" AS "ISIN",
                "_root.base"."Ticker" AS "Ticker"
            ) AS "_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Securities"
        WHERE
             (
                ("public"."Securities"."ISIN") = ANY(
                  (
                    '{"isin1",...,"isin10K+"}'
                  ) :: varchar []
                )
              )
      ) AS "_root.base"
  ) AS "_root"

so the query runs for 2+ minutes

if the code inside ANY is replaced from array to a list of values like in the query below the results comes in subsecond

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."CUSIP" AS "CUSIP",
                "_root.base"."ISIN" AS "ISIN",
                "_root.base"."Ticker" AS "Ticker"
            ) AS "_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Securities"
        WHERE
             (
                ("public"."Securities"."ISIN") = ANY(
                  (
                    VALUES('isin1'), ('...'), ('isin10K')
                )
              )
      ) AS "_root.base"
  ) AS "_root"

The solution is described in the article here https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/

It would be cool to incorporate this into harusa connector

Jul 16 '24 16:07 SlavaChristin

Thank you for the report. We intend on to benchmark these two approaches together with the new approach we use in ndc-postgres and get a better understanding on how to improve the situation for v3.

For v2, we don't plan to prioritize this at the moment, however it is likely that this problem can be mitigated using Native Queries if necessary.

Jul 22 '24 11:07 soupi