rum icon indicating copy to clipboard operation
rum copied to clipboard

limit is filter after scan whole posting list?

Open digoal opened this issue 5 years ago • 2 comments

postgres=> insert into test_rum_add select generate_series(1,10000000),  tsvector 'a b c', clock_timestamp();
INSERT 0 10000000
postgres=> create index idx_test_rum_add_1 on test_rum_add using rum (arr rum_tsvector_hash_addon_ops, ts) with (attach='ts', to='arr');
CREATE INDEX

postgres=> select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
    id    |     arr     |             ts             
----------+-------------+----------------------------
 10000000 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999999 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999998 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999997 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999996 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945626
  9999995 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945625
  9999994 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999993 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999992 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
  9999991 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
(10 rows)

postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.51 rows=10 width=40) (actual time=6335.531..6335.539 rows=10 loops=1)
   Output: id, arr, ts, ((ts <=> '2020-05-23 00:00:00'::timestamp without time zone))
   Buffers: shared hit=28705, temp read=42536 written=67010
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..309926.60 rows=10000000 width=40) (actual time=6335.529..6335.534 rows=10 loops=1)
         Output: id, arr, ts, (ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Order By: (test_rum_add.ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Buffers: shared hit=28705, temp read=42536 written=67010
 Planning Time: 0.050 ms
 Execution Time: 6391.589 ms
(10 rows)


postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.46 rows=10 width=32) (actual time=2380.119..2380.126 rows=10 loops=1)
   Output: id, arr, ts
   Buffers: shared hit=28706, temp read=1 written=14678
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..259926.60 rows=10000000 width=32) (actual time=2380.117..2380.122 rows=10 loops=1)
         Output: id, arr, ts
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Buffers: shared hit=28706, temp read=1 written=14678
 Planning Time: 0.072 ms
 Execution Time: 2414.058 ms
(9 rows)

i think it will improved by limit push to scan posting phase.

best regards , digoal

digoal avatar May 22 '20 09:05 digoal

Now limit value can't pass to scan logical code.

yjhjstz avatar Nov 19 '20 08:11 yjhjstz

On Fri, May 22, 2020, 12:52 Digoal.zhou [email protected] wrote:

postgres=> insert into test_rum_add select generate_series(1,10000000), tsvector 'a b c', clock_timestamp(); INSERT 0 10000000 postgres=> create index idx_test_rum_add_1 on test_rum_add using rum (arr rum_tsvector_hash_addon_ops, ts) with (attach='ts', to='arr'); CREATE INDEX

postgres=> select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10; id | arr | ts ----------+-------------+---------------------------- 10000000 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628 9999999 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628 9999998 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627 9999997 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627 9999996 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945626 9999995 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945625 9999994 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624 9999993 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624 9999992 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623 9999991 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623 (10 rows)

postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10; QUERY PLAN

Limit (cost=13.20..13.51 rows=10 width=40) (actual time=6335.531..6335.539 rows=10 loops=1) Output: id, arr, ts, ((ts <=> '2020-05-23 00:00:00'::timestamp without time zone)) Buffers: shared hit=28705, temp read=42536 written=67010 -> Index Scan using idx_test_rum_add_1 on public.test_rum_add (cost=13.20..309926.60 rows=10000000 width=40) (actual time=6335.529..6335.534 rows=10 loops=1) Output: id, arr, ts, (ts <=> '2020-05-23 00:00:00'::timestamp without time zone) Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery) Order By: (test_rum_add.ts <=> '2020-05-23 00:00:00'::timestamp without time zone) Buffers: shared hit=28705, temp read=42536 written=67010 Planning Time: 0.050 ms Execution Time: 6391.589 ms (10 rows)

postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' limit 10; QUERY PLAN

Limit (cost=13.20..13.46 rows=10 width=32) (actual time=2380.119..2380.126 rows=10 loops=1) Output: id, arr, ts Buffers: shared hit=28706, temp read=1 written=14678 -> Index Scan using idx_test_rum_add_1 on public.test_rum_add (cost=13.20..259926.60 rows=10000000 width=32) (actual time=2380.117..2380.122 rows=10 loops=1) Output: id, arr, ts Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery) Buffers: shared hit=28706, temp read=1 written=14678 Planning Time: 0.072 ms Execution Time: 2414.058 ms (9 rows)

i think it will improved by limit push to scan posting phase.

That would be really nice.

Limit here is a parameter for fts operator, which is impossible, but we can wrap operator into function with limit as a parameter, the problem is that index will not used. But we can use pg13 operator operator support functions to rewrite internally to operator. We have experience with this technique, but I'm wondering what are the use cases, is is worth to implement it ?

best regards ,

digoal

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/postgrespro/rum/issues/86, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQURYXYPOMYZBHNXJVOUTDRSZDPJANCNFSM4NHUVA5Q .

obartunov avatar Nov 19 '20 10:11 obartunov