graphql-engine icon indicating copy to clipboard operation
graphql-engine copied to clipboard

Extremely slow order_by of nested object while querying nested relationships

Open carlosbaraza opened this issue 3 years ago • 2 comments
trafficstars

Version Information

Server Version: 2.11.2 CLI Version (for CLI related issue): 2.11.2

Environment

MacOS / Linux

What is the current behaviour?

Sorting by a nested table column and at the same time joining big tables causes Hasura to fetch all records, serialise them and then sort, which is extremely slow. Even if a limit and offset are set to paginate the results.

Example query which takes 15 seconds with our dataset:

query ProjectUsers {
  project(
    order_by: {
      users: {
        name: desc
      }
    }
    offset: 0
    limit: 10
  ) {
    id
    user {
      companies {
        id
      }
    }
  }
}

What is the expected behaviour?

I would expect the query to be much faster.

Please provide any traces or logs that could help here.

Analyze performance

Aggregate  (cost=7269481.60..7269481.61 rows=1 width=32)
  ->  Limit  (cost=7269481.45..7269481.48 rows=10 width=46)
        ->  Sort  (cost=7269481.45..7269521.37 rows=15967 width=46)
              Sort Key: users.name DESC
              ->  Nested Loop Left Join  (cost=447.13..7269136.41 rows=15967 width=46)
                    ->  Seq Scan on project  (cost=0.00..638.67 rows=15967 width=32)
                    ->  Nested Loop Left Join  (cost=447.13..455.20 rows=1 width=46)
                          ->  Limit  (cost=0.29..8.30 rows=1 width=442)
                                ->  Index Scan using users_pkey on users  (cost=0.29..8.30 rows=1 width=442)
                                      Index Cond: (id = project.user_id)
                          ->  Aggregate  (cost=446.84..446.85 rows=1 width=32)
                                ->  Seq Scan on user_company  (cost=0.00..446.83 rows=1 width=16)
                                      Filter: (users.id = user_id)
                                SubPlan 3
                                  ->  Result  (cost=0.00..0.01 rows=1 width=32)
                          SubPlan 1
                            ->  Result  (cost=0.00..0.01 rows=1 width=32)
                    SubPlan 2
                      ->  Result  (cost=0.00..0.01 rows=1 width=32)
JIT:
  Functions: 27
  Options: Inlining true, Optimization true, Expressions true, Deforming true

Generated SQL:

SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.or.user.pg.name" DESC NULLS FIRST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."id" AS "id",
                "_root.or.user"."user" AS "user"
            ) AS "_e"
        )
      ) AS "root",
      "_root.or.user"."root.or.user.pg.name" AS "root.or.user.pg.name"
    FROM
      (
        SELECT
          *
        FROM
          "public"."project"
        WHERE
          ('true')
      ) AS "_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          row_to_json(
            (
              SELECT
                "_e"
              FROM
                (
                  SELECT
                    "_root.or.user.ar.user.companies"."companies" AS "companies"
                ) AS "_e"
            )
          ) AS "user",
          "_root.or.user.base"."name" AS "root.or.user.pg.name"
        FROM
          (
            SELECT
              *
            FROM
              "public"."users"
            WHERE
              (("_root.base"."user_id") = ("id"))
            LIMIT
              1
          ) AS "_root.or.user.base"
          LEFT OUTER JOIN LATERAL (
            SELECT
              coalesce(json_agg("companies"), '[]') AS "companies"
            FROM
              (
                SELECT
                  row_to_json(
                    (
                      SELECT
                        "_e"
                      FROM
                        (
                          SELECT
                            "_root.or.user.ar.user.companies.base"."id" AS "id"
                        ) AS "_e"
                    )
                  ) AS "companies"
                FROM
                  (
                    SELECT
                      *
                    FROM
                      "public"."user_company"
                    WHERE
                      (
                        ("_root.or.user.base"."id") = ("user_id")
                      )
                  ) AS "_root.or.user.ar.user.companies.base"
              ) AS "_root.or.user.ar.user.companies"
          ) AS "_root.or.user.ar.user.companies" ON ('true')
      ) AS "_root.or.user" ON ('true')
    ORDER BY
      "root.or.user.pg.name" DESC NULLS FIRST
    LIMIT
      10 OFFSET 0
  ) AS "_root"

Any possible solutions?

My workaround is to do two queries, one fetching only the ids and then another to fetch all the rest of the data:

  1. Get the page ids (~60ms)
query ProjectUserIds {
  project(
    order_by: {
      users: {
        name: desc
      }
    }
    offset: 0
    limit: 10
  ) {
    id
  }
}
  1. Get all the details for the ids in the current page (~200ms)
query ProjectUserDetails {
  project(
    where: {
      id: { _in: $ids }
    }
  ) {
    user { companies { id } }
    ...other data
  }
}
  1. Sort in the application layer again by the ids returned in the first query.

This process is much faster. The first query takes 60ms, compared to 15s. The second query takes 200ms. The only problem is that currently there are two roundtrips. If this process was done by hasura internally, these queries would be much faster. It could be possible to enable a mode where hasura internally runs two queries instead of the row_to_json performance bottleneck.

Can you identify the location in the source code where the problem exists?

The order_by sorting happens after a Nested Loop Left Join that is serialised to JSON.

If the bug is confirmed, would you be willing to submit a PR?

I don't know Haskell.

carlosbaraza avatar Sep 16 '22 17:09 carlosbaraza

@carlosbaraza Looks like the sort operation in your query (name DESC) is the most expensive, please see if it can be optimized by creating an index on that field that aligns with the query(e.g. name DESC NULLS LAST) and then modifying your graphQL query to (for example): (order_by: {Name: desc_nulls_last}, limit: 10, offset: 0)

I am attaching an article that you may find useful: https://www.postgresql.org/docs/current/indexes-ordering.html

adas98012 avatar Sep 17 '22 00:09 adas98012

An index does not solve the problem (tried with many different indexes). The sorting happens after some row_to_json loop over all the data, which defeats the purpose of the pagination.

Anyhow, how would can you explain that the same query without querying any data (only id) is ~60ms and if you query data at the same time it is ~15000ms? This clearly tells me that the queried data is in some way entangled with the ORDER BY, even if the data is not needed, which means that the query is poorly designed.

carlosbaraza avatar Sep 17 '22 08:09 carlosbaraza

Ran into this exact issue and this was the top result for "Hasura sort nested query slow". I'm running into the same issue. In my case, it's a difference of 30ms and 3 seconds 😅.

Sorting by a nested field is slow if you also include additional nested data. It appears that in your case it's loading the user { companies { id } } for all records even if they're not in the limit/offset. I was able to reproduce this on my dataset as well.

Also, this only happens if my nested query goes 2+ levels deep.

For example, this query takes 3 seconds:

query FindStatusListBooks($filterBy: user_books_bool_exp!, 
  $orderBy: [user_books_order_by]!, 
  $limit: Int!, 
  $offset: Int!
) {  
  userBooks: user_books(where: $filterBy, order_by: $orderBy, limit: $limit, offset: $offset) {
    book {      
      id title
      currentUserBook: user_books(where: {userId: {_eq: 1}}) {        
        rating
        datesRead: user_book_reads {
          id
        }
      }
    }
  }
}

While this query takes 30ms:

query FindStatusListBooks($filterBy: user_books_bool_exp!, 
  $orderBy: [user_books_order_by]!, 
  $limit: Int!, 
  $offset: Int!
) {  
  userBooks: user_books(where: $filterBy, order_by: $orderBy, limit: $limit, offset: $offset) {
    book {      
      id title
      currentUserBook: user_books(where: {userId: {_eq: 1}}) {        
        rating
      }
    }
  }
}

That join to datesRead shouldn't be a costly one, and without the order param it runs in 100ms. It seems like adding additional levels of nesting when ordering by a nested field the query results are much slower.

adamfortuna avatar Dec 16 '22 00:12 adamfortuna

Our workaround is to do two queries. One with the order by and limit, and another fetching the data using the page ids. Annoying and poor DX though.

carlosbaraza avatar Dec 16 '22 23:12 carlosbaraza

Same here

manuFL avatar Dec 21 '22 15:12 manuFL

Same issue, any query having a nested object sort is just impossible to use. This is just one example:

1.000ms vs 35.000+ms

order_by nested table row:

query getAssets($orderBy: [Asset_order_by!]) {
  assets(order_by: $orderBy, offset: 0, limit: 10) {
    assetName
    firstAppearedInBlock {
      number
    }
  }
}

{
  "orderBy": {
    "firstAppearedInBlock": {
      "number": "asc"
    }
  }
}

Aggregate  (cost=131614248.96..131614248.97 rows=1 width=32)
  ->  Limit  (cost=131614248.80..131614248.83 rows=10 width=36)
        ->  Sort  (cost=131614248.65..131633573.14 rows=7729793 width=36)
              Sort Key: "_root.or.firstAppearedInBlock.base".number
              ->  Nested Loop Left Join  (cost=0.86..131338709.25 rows=7729793 width=36)
                    ->  Seq Scan on "Asset"  (cost=0.00..280068.93 rows=7729793 width=20)
                    ->  Subquery Scan on "_root.or.firstAppearedInBlock.base"  (cost=0.86..16.93 rows=1 width=36)
                          ->  Limit  (cost=0.86..16.91 rows=1 width=256)
                                ->  Nested Loop Left Join  (cost=0.86..16.91 rows=1 width=256)
                                      ->  Index Scan using idx_block_slot_no on block  (cost=0.43..8.45 rows=1 width=28)
                                            Index Cond: ("Asset"."firstAppearedInSlot" = (slot_no)::bigint)
                                      ->  Index Only Scan using idx_block_previous_id on block next_block  (cost=0.43..8.45 rows=1 width=8)
                                            Index Cond: (previous_id = block.id)
                          SubPlan 1
                            ->  Result  (cost=0.00..0.01 rows=1 width=32)
                    SubPlan 2
                      ->  Result  (cost=0.00..0.01 rows=1 width=32)

order_by row in same table

{
  "orderBy": {
    "assetName": "asc"
  }
}

Aggregate  (cost=305855.27..305855.28 rows=1 width=32)
  ->  Nested Loop Left Join  (cost=305685.20..305855.15 rows=10 width=48)
        ->  Limit  (cost=305684.33..305685.50 rows=10 width=596)
              ->  Gather Merge  (cost=305684.33..1057507.65 rows=6443750 width=596)
                    Workers Planned: 2
                    ->  Sort  (cost=304684.31..312738.99 rows=3221875 width=596)
                          Sort Key: "Asset"."assetName"
                          ->  Parallel Seq Scan on "Asset"  (cost=0.00..235060.75 rows=3221875 width=596)
        ->  Subquery Scan on "_root.or.firstAppearedInBlock.base"  (cost=0.86..16.93 rows=1 width=32)
              ->  Limit  (cost=0.86..16.91 rows=1 width=256)
                    ->  Nested Loop Left Join  (cost=0.86..16.91 rows=1 width=256)
                          ->  Index Scan using idx_block_slot_no on block  (cost=0.43..8.45 rows=1 width=28)
                                Index Cond: ("Asset"."firstAppearedInSlot" = (slot_no)::bigint)
                          ->  Index Only Scan using idx_block_previous_id on block next_block  (cost=0.43..8.45 rows=1 width=8)
                                Index Cond: (previous_id = block.id)
              SubPlan 1
                ->  Result  (cost=0.00..0.01 rows=1 width=32)
        SubPlan 2
          ->  Result  (cost=0.00..0.01 rows=1 width=32)

Normal query without hasura but using the same table relation:

object_relationships:
  - name: firstAppearedInBlock
    using:
      manual_configuration:
        column_mapping:
          firstAppearedInSlot: slotNo
        insertion_order: null
        remote_table:
          name: Block
          schema: public

Execution time 4.500ms

SELECT
    ast."assetName",
    bl.number
FROM "Asset" as ast
JOIN "Block" as bl ON ast."firstAppearedInSlot" = bl."slotNo"
ORDER BY bl.number ASC
LIMIT 10 OFFSET 0

Limit  (cost=1.30..83.59 rows=10 width=20) (actual time=4381.987..4382.008 rows=10 loops=1)
  ->  Nested Loop Left Join  (cost=1.30..63692437.96 rows=7739818 width=20) (actual time=4381.987..4382.006 rows=10 loops=1)
     ->  Nested Loop  (cost=0.86..49129755.03 rows=7739818 width=28) (actual time=4381.970..4381.979 rows=10 loops=1)
        ->  Index Scan using idx_block_block_no on block  (cost=0.43..495516.74 rows=8373554 width=36) (actual time=0.037..722.119 rows=5406749 loops=1)
        ->  Index Scan using idx_asset_first_appeared_in_slot on "Asset" ast  (cost=0.43..4.86 rows=95 width=20) (actual time=0.001..0.001 rows=0 loops=5406749)
            Index Cond: ("firstAppearedInSlot" = (block.slot_no)::bigint)
     ->  Index Only Scan using idx_block_previous_id on block next_block  (cost=0.43..1.87 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=10)
         Index Cond: (previous_id = block.id)
         Heap Fetches: 10
Planning Time: 1.249 ms
Execution Time: 4382.068 ms

@adamfortuna workaround down work for me. Even if i don't query nested data the order_by still takes ~30.000ms. It lowers it by ~5.000ms but the time i spend querying the missing information for 10+ records makes this not worth.

query getAssets($orderBy: [Asset_order_by!]) {
  assets(order_by: $orderBy, offset: 0, limit: 10) {
    assetName
  }
}

{
  "orderBy": {
    "firstAppearedInBlock": {
      "number": "asc"
    }
  }
}

Aggregate  (cost=134221729.63..134221729.64 rows=1 width=32)
  ->  Limit  (cost=134221729.48..134221729.50 rows=10 width=36)
        ->  Sort  (cost=134221729.48..134241467.57 rows=7895237 width=36)
              Sort Key: block.block_no
              ->  Nested Loop Left Join  (cost=0.86..134051116.24 rows=7895237 width=36)
                    ->  Seq Scan on "Asset"  (cost=0.00..286063.37 rows=7895237 width=20)
                    ->  Limit  (cost=0.86..16.91 rows=1 width=256)
                          ->  Nested Loop Left Join  (cost=0.86..16.91 rows=1 width=256)
                                ->  Index Scan using idx_block_slot_no on block  (cost=0.43..8.45 rows=1 width=28)
                                      Index Cond: ("Asset"."firstAppearedInSlot" = (slot_no)::bigint)
                                ->  Index Only Scan using idx_block_previous_id on block next_block  (cost=0.43..8.45 rows=1 width=8)
                                      Index Cond: (previous_id = block.id)
                    SubPlan 1
                      ->  Result  (cost=0.00..0.01 rows=1 width=32)

infnada avatar Feb 06 '23 21:02 infnada

This seems like a very basic case that should be handled! Thanks for the clear example we ran into the same issue as well.

RodriguezLucha avatar Mar 31 '23 17:03 RodriguezLucha

I would like to insist in this issue and provide an easy solution.

(i've seen reductions from 40+ minutes to ~40 milliseconds on "medium" datasets ~100 million rows. "not this specific example")

Following my previous comment example:

query getAssets($orderBy: [Asset_order_by!]) {
  assets(order_by: $orderBy, offset: 0, limit: 10) {
    assetName
  }
}

{
  "orderBy": {
    "firstAppearedInBlock": {
      "number": "asc"
    }
  }
}

i've got the following query:

SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.or.firstAppearedInBlock.pg.number" ASC NULLS LAST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."assetName" AS "assetName"
            ) AS "_e"
        )
      ) AS "root",
      "_root.or.firstAppearedInBlock"."root.or.firstAppearedInBlock.pg.number" AS "root.or.firstAppearedInBlock.pg.number"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Asset"
        WHERE
          ('true')
      ) AS "_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          "_root.or.firstAppearedInBlock.base"."number" AS "root.or.firstAppearedInBlock.pg.number"
        FROM
          (
            SELECT
              *
            FROM
              "public"."Block"
            WHERE
              (
                ("_root.base"."firstAppearedInSlot") = ("slotNo")
              )
            LIMIT
              1
          ) AS "_root.or.firstAppearedInBlock.base"
      ) AS "_root.or.firstAppearedInBlock" ON ('true')
    ORDER BY
      "root.or.firstAppearedInBlock.pg.number" ASC NULLS LAST
    LIMIT
      10 OFFSET 0
  ) AS "_root"

The part that is causing massive time increments is the use of LEFT OUTER JOIN LATERAL for the sorting operation.

I understand the "ease of use" that LEFT OUTER JOIN LATERAL provides when extracting/selecting data to be displayed, but sorting should be done by a normal JOIN.

Example

SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.or.firstAppearedInBlock.pg.number" ASC NULLS LAST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."assetName" AS "assetName"
            ) AS "_e"
        )
      ) AS "root",
      "root.or.firstAppearedInBlock".number AS "root.or.firstAppearedInBlock.pg.number"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Asset"
        WHERE
          ('true')
      ) AS "_root.base"

--- use JOIN 
      JOIN "public"."Block" AS "root.or.firstAppearedInBlock" ON ("_root.base"."firstAppearedInSlot") = ("slotNo")
---

    ORDER BY
      "root.or.firstAppearedInBlock".number ASC NULLS LAST
    LIMIT
      10 OFFSET 0
  ) AS "_root"

I don't know if there's a specific reason to use LEFT OUTER JOIN LATERAL to sort the results. But even if there is, maybe you can provide a way to let the user decide which kind of join to use by some special value in the orderBy variable or something...:

{
  "orderBy": {
    $_inner: true, <-- or whatever
    "firstAppearedInBlock": {
      "number": "asc"
    }
  }
}

or even decide it in the object_relationship by "not allowing/showing nulls in relationships", meaning no LEFT JOIN

object_relationships:
  - name: firstAppearedInBlock
    using:
      manual_configuration:
        column_mapping:
          firstAppearedInSlot: slotNo
        sorting: inner <-- or whatever
        insertion_order: null
        remote_table:
          name: Block
          schema: public

Currently it's just not feasible to use hasura when you need nested sorting (which should be a really common case)

infnada avatar Jun 30 '23 05:06 infnada

Ran into similar issues. Sorting by nested object is not usable on large tables.

Today, you are better off creating a view with that includes the column from the left table you want to sort by.

wieseljonas avatar Sep 17 '23 09:09 wieseljonas