oban icon indicating copy to clipboard operation
oban copied to clipboard

improve Oban Web full text search speed

Open dvic opened this issue 2 years ago • 2 comments

Environment

  • Oban.Pro v0.9.2
  • Oban.Web v2.7.4
  • PostgreSQL 13
  • Elixir 1.13 and OTP 24

Current Behavior

With a large number of jobs (in our case 400k) we encounter the following query to have 10+ seconds execution time:

SELECT
  o0."id",
  o0."state",
  o0."queue",
  o0."worker",
  o0."args",
  o0."meta",
  o0."tags",
  o0."errors",
  o0."attempt",
  o0."attempted_by",
  o0."max_attempts",
  o0."priority",
  o0."attempted_at",
  o0."cancelled_at",
  o0."completed_at",
  o0."discarded_at",
  o0."inserted_at",
  o0."scheduled_at"
FROM
  "public"."oban_jobs" AS o0
WHERE
  (o0."state" = $1)
  AND ($2
    AND (((($3
            OR jsonb_to_tsvector(o0."args",
              $9) @@ websearch_to_tsquery($4) )
          OR jsonb_to_tsvector(o0."meta",
            $10) @@ websearch_to_tsquery($5) )
        OR array_to_tsvector(o0."tags") @@ websearch_to_tsquery($6) )
      OR (o0."worker" ILIKE $7)))
ORDER BY
  o0."attempted_at" DESC
LIMIT
  $8

Expected Behavior

This query should not take this long with appropriate indices.

The above query (I assume, I don't have logs to what $9 an d$10 above actually are) boils down to:

SELECT *
FROM OBAN_JOBS O0
WHERE (
	JSONB_TO_TSVECTOR(O0."args", '["string", "numeric"]') @@ WEBSEARCH_TO_TSQUERY('16372')
 OR JSONB_TO_TSVECTOR(O0."meta", '["string", "numeric"]') @@ WEBSEARCH_TO_TSQUERY('16372')
 OR ARRAY_TO_TSVECTOR(O0."tags") @@ WEBSEARCH_TO_TSQUERY('16372')
);

In this example we are searching for '16372' and this simplified query currently takes about 8 seconds.

If I add

CREATE INDEX i0 ON OBAN_JOBS USING GIN (JSONB_TO_TSVECTOR('english', "args", '["string", "numeric"]')); 
CREATE INDEX i1 ON OBAN_JOBS USING GIN (JSONB_TO_TSVECTOR('english', "meta", '["string", "numeric"]'));
CREATE INDEX i2 ON OBAN_JOBS USING GIN (ARRAY_TO_TSVECTOR("tags"));

and run the same query but with 'english' as regconfig (you have to specify this otherwise the expression is not immutable, which is required for indices),

SELECT *
FROM OBAN_JOBS O0
WHERE (
	JSONB_TO_TSVECTOR('english', O0."args", '["string", "numeric"]') @@ WEBSEARCH_TO_TSQUERY('16372')
 OR JSONB_TO_TSVECTOR('english', O0."meta", '["string", "numeric"]') @@ WEBSEARCH_TO_TSQUERY('16372')
 OR ARRAY_TO_TSVECTOR(O0."tags") @@ WEBSEARCH_TO_TSQUERY('16372')
);

the query completes in 115ms, almost a 70x improvement 🙂

dvic avatar Jan 10 '22 13:01 dvic

@dvic Great suggestion! Improving Web search speed out of the box would be fantastic, but I can't force that many extra indexes on apps by default.

What do you think about a guide on when and how to add the indexes? Perhaps a hint of some kind in Web when it detects a slow full-text search?

sorentwo avatar Jan 10 '22 15:01 sorentwo

@dvic Great suggestion! Improving Web search speed out of the box would be fantastic, but I can't force that many extra indexes on apps by default.

What do you think about a guide on when and how to add the indexes?

A guide sounds like a good idea. The only problem is that you would still have to change all JSONB_TO_TSVECTOR expressions to JSONB_TO_TSVECTOR('english', ...), otherwise we can't make an index for it. Not sure if it makes sense here, but since this is already passed as a positional argument, you could make this parameter configurable.

Perhaps a hint of some kind in Web when it detects a slow full-text search?

This would definitely be a good feature, in the beginning it could just show the execution time of the search somewhere next to the search input. Or only show it if it gets above a certain threshold? With a link to the docs how to improve the speed?

dvic avatar Jan 10 '22 15:01 dvic