rum icon indicating copy to clipboard operation
rum copied to clipboard

Combining ts_query AND IN (BIGINT[]) is ~4x slower than GIN

Open andreak opened this issue 7 years ago • 0 comments

Hi.

Having this index:

create index origo_email_delivery_fts_all_folder_idx ON origo_email_delivery using gin (fts_all, folder_id)

And this query:

EXPLAIN ANALYZE
SELECT
    del.entity_id,
    del.received_timestamp,
    del.received_timestamp <=> '3000-01-01' :: TIMESTAMP,
    del.folder_id
FROM origo_email_delivery del
WHERE del.fts_all @@ to_tsquery('simple', 'andreas&joseph')
      AND del.folder_id IN (44961, 204483, 44965, 2470519)
ORDER BY del.received_timestamp <=> '3000-01-01' :: TIMESTAMP
LIMIT 10 offset 10000

Using GIN-index:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                              QUERY PLAN                                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32018.97..32018.97 rows=1 width=32) (actual time=56.675..56.676 rows=10 loops=1)                                                                         │
│   ->  Sort  (cost=31994.41..32018.97 rows=9822 width=32) (actual time=56.077..56.469 rows=10010 loops=1)                                                              │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                                       │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                                        │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=404.67..31343.13 rows=9822 width=32) (actual time=35.911..53.027 rows=14806 loops=1)                  │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                       │
│               Heap Blocks: exact=13043                                                                                                                                │
│               ->  Bitmap Index Scan on origo_email_delivery_fts_all_folder_idx  (cost=0.00..402.22 rows=9822 width=0) (actual time=34.493..34.493 rows=14806 loops=1) │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                   │
│ Planning time: 0.614 ms                                                                                                                                               │
│ Execution time: 56.726 ms                                                                                                                                             │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

And with RUM-index:

create index rum_idx ON origo_email_delivery using rum (fts_all rum_tsvector_addon_ops, folder_id, received_timestamp) WITH (attach=received_timestamp, "to"=fts_all)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32066.97..32066.97 rows=1 width=32) (actual time=217.899..217.900 rows=10 loops=1)                                                        │
│   ->  Sort  (cost=32042.41..32066.97 rows=9822 width=32) (actual time=217.300..217.688 rows=10010 loops=1)                                             │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                        │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                         │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=452.67..31391.13 rows=9822 width=32) (actual time=196.797..214.216 rows=14806 loops=1) │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))        │
│               Heap Blocks: exact=13043                                                                                                                 │
│               ->  Bitmap Index Scan on rum_idx  (cost=0.00..450.22 rows=9822 width=0) (actual time=195.369..195.369 rows=14806 loops=1)                │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))    │
│ Planning time: 0.721 ms                                                                                                                                │
│ Execution time: 217.969 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

217 / 56 = 3,875 time slower. Can anything be done to speed this up?

Thanks.

andreak avatar May 24 '17 06:05 andreak