rum
rum copied to clipboard
Combining ts_query AND IN (BIGINT[]) is ~4x slower than GIN
Hi.
Having this index:
create index origo_email_delivery_fts_all_folder_idx ON origo_email_delivery using gin (fts_all, folder_id)
And this query:
EXPLAIN ANALYZE
SELECT
del.entity_id,
del.received_timestamp,
del.received_timestamp <=> '3000-01-01' :: TIMESTAMP,
del.folder_id
FROM origo_email_delivery del
WHERE del.fts_all @@ to_tsquery('simple', 'andreas&joseph')
AND del.folder_id IN (44961, 204483, 44965, 2470519)
ORDER BY del.received_timestamp <=> '3000-01-01' :: TIMESTAMP
LIMIT 10 offset 10000
Using GIN-index:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=32018.97..32018.97 rows=1 width=32) (actual time=56.675..56.676 rows=10 loops=1) │
│ -> Sort (cost=31994.41..32018.97 rows=9822 width=32) (actual time=56.077..56.469 rows=10010 loops=1) │
│ Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone)) │
│ Sort Method: quicksort Memory: 1541kB │
│ -> Bitmap Heap Scan on origo_email_delivery del (cost=404.67..31343.13 rows=9822 width=32) (actual time=35.911..53.027 rows=14806 loops=1) │
│ Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[]))) │
│ Heap Blocks: exact=13043 │
│ -> Bitmap Index Scan on origo_email_delivery_fts_all_folder_idx (cost=0.00..402.22 rows=9822 width=0) (actual time=34.493..34.493 rows=14806 loops=1) │
│ Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[]))) │
│ Planning time: 0.614 ms │
│ Execution time: 56.726 ms │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)
And with RUM-index:
create index rum_idx ON origo_email_delivery using rum (fts_all rum_tsvector_addon_ops, folder_id, received_timestamp) WITH (attach=received_timestamp, "to"=fts_all)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=32066.97..32066.97 rows=1 width=32) (actual time=217.899..217.900 rows=10 loops=1) │
│ -> Sort (cost=32042.41..32066.97 rows=9822 width=32) (actual time=217.300..217.688 rows=10010 loops=1) │
│ Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone)) │
│ Sort Method: quicksort Memory: 1541kB │
│ -> Bitmap Heap Scan on origo_email_delivery del (cost=452.67..31391.13 rows=9822 width=32) (actual time=196.797..214.216 rows=14806 loops=1) │
│ Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[]))) │
│ Heap Blocks: exact=13043 │
│ -> Bitmap Index Scan on rum_idx (cost=0.00..450.22 rows=9822 width=0) (actual time=195.369..195.369 rows=14806 loops=1) │
│ Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[]))) │
│ Planning time: 0.721 ms │
│ Execution time: 217.969 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)
217 / 56 = 3,875 time slower. Can anything be done to speed this up?
Thanks.