Investigate fuzzy search query performance discrepancy

Open He1DAr opened this issue 1 year ago • 1 comments

There’s a significant difference in performance when running the same query locally vs. in the dev environment, even though the query uses the same execution strategy in both environments.

Query:

SELECT 
	  *
      FROM stacks_blockchain_api.txs
      WHERE canonical = true AND microblock_canonical = true AND block_height <= 157521 AND burn_block_time <= 1730591999
	  AND (
        CASE 
          WHEN EXISTS (
            SELECT 1
            FROM pg_extension
            WHERE extname = 'pg_trgm'
          ) 
          THEN similarity(contract_call_function_name, 'swab') > 0.3
          ELSE contract_call_function_name ILIKE '%swab%'
          END
      )
      ORDER BY burn_block_time DESC, block_height DESC, microblock_sequence DESC, tx_index DESC
      LIMIT 30
      OFFSET 0;

Analysis output:

[
  {
    "Plan": {
      "Node Type": "Limit",
      "Parallel Aware": false,
      "Async Capable": false,
      "Startup Cost": 821.56,
      "Total Cost": 887.96,
      "Plan Rows": 30,
      "Plan Width": 2475,
      "Actual Startup Time": 730.829,
      "Actual Total Time": 730.837,
      "Actual Rows": 30,
      "Actual Loops": 1,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "InitPlan 1 (returns $0)",
          "Parallel Aware": false,
          "Async Capable": false,
          "Relation Name": "pg_extension",
          "Alias": "pg_extension",
          "Startup Cost": 0.00,
          "Total Cost": 1.01,
          "Plan Rows": 1,
          "Plan Width": 0,
          "Actual Startup Time": 0.019,
          "Actual Total Time": 0.020,
          "Actual Rows": 1,
          "Actual Loops": 1,
          "Filter": "(extname = 'pg_trgm'::name)",
          "Rows Removed by Filter": 1
        },
        {
          "Node Type": "Incremental Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Async Capable": false,
          "Startup Cost": 820.54,
          "Total Cost": 13030750.71,
          "Plan Rows": 5886325,
          "Plan Width": 2475,
          "Actual Startup Time": 730.827,
          "Actual Total Time": 730.831,
          "Actual Rows": 30,
          "Actual Loops": 1,
          "Sort Key": ["txs.burn_block_time DESC", "txs.block_height DESC", "txs.microblock_sequence DESC", "txs.tx_index DESC"],
          "Presorted Key": ["txs.burn_block_time"],
          "Full-sort Groups": {
            "Group Count": 1,
            "Sort Methods Used": ["quicksort"],
            "Sort Space Memory": {
              "Average Sort Space Used": 85,
              "Peak Sort Space Used": 85
            }
          },
          "Plans": [
            {
              "Node Type": "Index Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Async Capable": false,
              "Scan Direction": "Backward",
              "Index Name": "txs_burn_block_time_index",
              "Relation Name": "txs",
              "Alias": "txs",
              "Startup Cost": 0.43,
              "Total Cost": 12688723.86,
              "Plan Rows": 5886325,
              "Plan Width": 2475,
              "Actual Startup Time": 717.984,
              "Actual Total Time": 730.722,
              "Actual Rows": 31,
              "Actual Loops": 1,
              "Index Cond": "(burn_block_time <= 1730591999)",
              "Rows Removed by Index Recheck": 0,
              "Filter": "(canonical AND microblock_canonical AND (block_height <= 157521) AND CASE WHEN $0 THEN (similarity(contract_call_function_name, 'swab'::text) > '0.3'::double precision) ELSE (contract_call_function_name ~~* '%swab%'::text) END)",
              "Rows Removed by Filter": 981643
            }
          ]
        }
      ]
    },
    "Planning Time": 0.363,
    "Triggers": [
    ],
    "Execution Time": 730.896
  }
]

Nov 12 '24 15:11 He1DAr

We've had several issues similar to this: a postgres query is relatively fast on a local machine, but slow in dev/staging. We've also seen instances where the query seems faster in prod or local when those environments are not under any load. I.e. a query may be fast on a highly provisioned machine that is not under load, but once it's under load that query ends end becoming a significant bottleneck.

It's generally a good rule of thumb that if a query is not performant in the dev/stg environment, then it will not be performant when used at scale in the prod environment.

With this particular query, I don't think we can squeeze out about 1-2 orders of magnitude better performance unless we make a significant change to the approach. I think we'd need to do something like: create a new table that is responsible for tracking search terms, and create a search query against that table. That approach may work but keep in mind it's more involved than it might seem at face value. It involves:

Create a new table in a sql migration file
Optimize the indexes for very fast search queries
Update several areas in the data ingestion code so that we write to this new table
Write an optimize sql read query against that new table

Mar 24 '25 15:03 zone117x