rum icon indicating copy to clipboard operation
rum copied to clipboard

I'm not seeing a performance win. What am I doing wrong.

Open jimfulton opened this issue 7 years ago • 15 comments

I have a database:

220K documents. (31% 9 words or less)

Unique words: P30 9 Median 57 P90 656 p99 1837 p99.9 2928 max 6398 ~130K null records (so ~350 records total)

We have an application where common queries return ~50K documents. When the database is warm, these take ~400ms or more. When cold many seconds. We're hoping that RUM can help, but so far, I'm not seeing an improvement.

Gory (but not complicated) details are here: https://gist.github.com/jimfulton/4104b5387cf7dce3754bef342a9f84c5

Any idea what we're doing wrong?

jimfulton avatar Feb 28 '17 16:02 jimfulton

I suppose it is because of (text_vector <=> to_tsquery('english', :q)) / coefficient expression. Can you try the query:

SELECT docid, (text_vector <=> to_tsquery('english', :q)) / coefficient as rank, count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', :q)
ORDER BY text_vector <=> to_tsquery('english', :q)
limit 9;

Can you say is it has better time?

za-arthur avatar Mar 01 '17 10:03 za-arthur

Thanks!

Some results and a little commentary. I'll follow on with a summary comment.

So, the original query:

SELECT docid,
       (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
ORDER BY rank
limit 9;
    docid    |    rank     | count 
-------------+-------------+-------
 -1423876420 | 0.000642552 | 20452
   944749260 | 0.000642552 | 20452
 -1476054022 |   0.0136008 | 20452
  -844874753 |   0.0164493 | 20452
 -1704100805 |   0.0205617 | 20452
 -1704100809 |   0.0205617 | 20452
   944749266 |   0.0205617 | 20452
    98129191 |   0.0205617 | 20452
  1735228498 |    0.131595 | 20452
(9 rows)

Time: 267.762 ms
EXPLAIN ANALYZE
SELECT docid, (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank, count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
ORDER BY rank
limit 9;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14041.58..14041.60 rows=9 width=16) (actual time=235.379..235.381 rows=9 loops=1)
   ->  Sort  (cost=14041.58..14092.81 rows=20495 width=16) (actual time=235.379..235.379 rows=9 loops=1)
         Sort Key: (((text_vector <=> '''europ'''::tsquery) / coefficient))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  WindowAgg  (cost=214.83..13614.26 rows=20495 width=16) (actual time=29.523..232.059 rows=20452 loops=1)
               ->  Bitmap Heap Scan on pgtextindex  (cost=214.83..13255.60 rows=20495 width=257) (actual time=4.284..19.537 rows=20452 loops=1)
                     Recheck Cond: (text_vector @@ '''europ'''::tsquery)
                     Heap Blocks: exact=7963
                     ->  Bitmap Index Scan on pgtextindex_rum_text_vector_idx  (cost=0.00..209.71 rows=20495 width=0) (actual time=3.234..3.234 rows=21508 loops=1)
                           Index Cond: (text_vector @@ '''europ'''::tsquery)
 Planning time: 0.193 ms
 Execution time: 235.431 ms
(12 rows)

Time: 236.517 ms

The results have to be sorted.

SELECT docid,
       (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
ORDER BY text_vector <=> to_tsquery('english', 'europe')
limit 9;
    docid    |   rank   | count 
-------------+----------+-------
  1086226858 |  3.07624 | 20757
 -1135901991 |  3.07624 | 20757
  1100820023 |  3.14987 | 20757
   373536736 |  3.14987 | 20757
   373536734 |  3.28987 | 20757
  1735228498 | 0.131595 | 20757
  -914821670 | 0.146216 | 20757
  1534494550 |  3.65541 | 20757
  1632017539 |  3.65541 | 20757
(9 rows)

Time: 43.328 ms
EXPLAIN ANALYZE
SELECT docid,
       (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
ORDER BY text_vector <=> to_tsquery('english', 'europe')
limit 9;
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=24.00..44.04 rows=9 width=20) (actual time=42.438..42.452 rows=9 loops=1)
   ->  WindowAgg  (cost=24.00..45655.80 rows=20495 width=20) (actual time=42.437..42.450 rows=9 loops=1)
         ->  Index Scan using pgtextindex_rum_text_vector_idx on pgtextindex  (cost=24.00..45245.90 rows=20495 width=257) (actual time=13.421..32.070 rows=20757 loops=1)
               Index Cond: (text_vector @@ '''europ'''::tsquery)
               Order By: (text_vector <=> '''europ'''::tsquery)
 Planning time: 0.235 ms
 Execution time: 44.654 ms
(7 rows)

Time: 45.309 ms

OK, with your suggestion, this is much faster, because it can use the index to sort. (Look ma, no sort.) Cool, but...

But the counts are off. That's odd. Is it using the index for counts? Is this a bug?

To double check counts, I got rid of order by:

SELECT docid,
       (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
limit 9;
    docid    |   rank   | count 
-------------+----------+-------
 -2147433172 | 0.657974 | 20452
 -2146979073 |  16.4493 | 20452
 -2146979072 |  12.0852 | 20452
 -2146857578 |  13.1595 | 20452
 -2146183061 | 0.657974 | 20452
 -2145606265 |  11.5547 | 20452
 -2145606227 | 0.483409 | 20452
 -2145154002 | 0.435226 | 20452
 -2137906516 |  11.2389 | 20452
(9 rows)

Time: 1006.311 ms
EXPLAIN ANALYZE
SELECT docid,
       (text_vector <=> to_tsquery('english', 'europe')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'europe')
limit 9;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..7.47 rows=9 width=16) (actual time=923.019..923.149 rows=9 loops=1)
   ->  WindowAgg  (cost=0.00..17021.09 rows=20495 width=16) (actual time=923.017..923.144 rows=9 loops=1)
         ->  Seq Scan on pgtextindex  (cost=0.00..16662.42 rows=20495 width=257) (actual time=0.016..909.228 rows=20452 loops=1)
               Filter: (text_vector @@ '''europ'''::tsquery)
               Rows Removed by Filter: 325742
 Planning time: 0.202 ms
 Execution time: 925.337 ms
(7 rows)

Time: 926.080 ms

So the counts are the same as the original query, but different from when ORDER BY text_vector <=> to_tsquery('english', 'europe') is used.

Oh and look, Postgres decided not to use the index, so without ordering it takes 4x as long. (Presumably, I could fix this using a function with cost set very high.)

In the examples above, I avoided prefix searches, because:

Timing is on.
SELECT docid,
       (text_vector <=> to_tsquery('english', 'euro:*')) / coefficient as rank,
       count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', 'euro:*')
ORDER BY text_vector <=> to_tsquery('english', 'euro:*')
limit 9;
psql:crash.sql:9: server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
psql:crash.sql:9: connection to server was lost

Should prefix queries work in ORDER BY?

jimfulton avatar Mar 01 '17 15:03 jimfulton

So some questions/comments:

  • Should prefix queries work in ORDER BY with RUM?

  • Should counts match? (I might be willing to trade off count accuracy for speed. :) )

  • IIUC, an index won't be used for sorting if other indexes are used. So we may lose the ORDER BY benefit when text search is combined with other indexes. We commonly want to do a text search broken down by category for 6 categories, with the top few results in each category. We do this now with some non-Postgresql indexing machinery. If I were to do this with Postgres, I'd try the rank window function using order by within categories, and I suspect the index won't be used for sorting in that case.

jimfulton avatar Mar 01 '17 16:03 jimfulton

Have you seen faceted search example http://akorotkov.github.io/blog/2016/06/17/faceted-search/

On Wed, Mar 1, 2017 at 7:15 PM, Jim Fulton [email protected] wrote:

So some questions/comments:

Should prefix queries work in ORDER BY with RUM?

Should counts match? (I might be willing to trade off count accuracy for speed. :) )

IIUC, an index won't be used for sorting if other indexes are used. So we may lose the ORDER BY benefit when text search is combined with other indexes. We commonly want to do a text search broken down by category for 6 categories, with the top few results in each category. We do this now with some non-Postgresql indexing machinery. If I were to do this with Postgres, I'd try the rank window function using order by within categories, and I suspect the index won't be used for sorting in that case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/postgrespro/rum/issues/13#issuecomment-283386833, or mute the thread https://github.com/notifications/unsubscribe-auth/AGFI4n-OuOLbvbnlHHaxQgSiMQkxUZ0bks5rhZm7gaJpZM4MOndM .

obartunov avatar Mar 02 '17 00:03 obartunov

Should counts match? (I might be willing to trade off count accuracy for speed. :) )

Yes, they should match. It seems that it is the bug. I will work on it.

za-arthur avatar Mar 02 '17 09:03 za-arthur

On Wed, Mar 1, 2017 at 7:00 PM, Oleg Bartunov [email protected] wrote:

Have you seen faceted search example http://akorotkov.github.io/blog/2016/06/17/faceted-search/

No, but as I said, that (window functions) is how I would (and probably will) approach what we're doing in Postgres.

My guess is that there won't be any win from the RUM index from this, but why guess? I'll try it today. :)

Thanks

jimfulton avatar Mar 02 '17 13:03 jimfulton

On Thu, Mar 2, 2017 at 4:59 AM, Arthur [email protected] wrote:

Should counts match? (I might be willing to trade off count accuracy for speed. :) ) Yes, they should match. It seems that it is the bug. I will work on it.

Thank you!

jimfulton avatar Mar 02 '17 13:03 jimfulton

Should a prefix search (like text_vector <=> to_tsquery('english', 'euro:*')) work?

Currently, it errors (and poisons the session) for me.

jimfulton avatar Mar 02 '17 16:03 jimfulton

Yep. It should work. I work on fixing it. It is the bug too.

Should counts match? (I might be willing to trade off count accuracy for speed. :) )

Currently I fixed this part.

za-arthur avatar Mar 02 '17 16:03 za-arthur

On Thu, Mar 2, 2017 at 11:26 AM, Arthur [email protected] wrote:

Yep. It should work. I work on fixing it. It is the bug too.

Cool. Thanks!

Should counts match? (I might be willing to trade off count accuracy for

speed. :) )

Currently I fixed this part.

Great!

jimfulton avatar Mar 02 '17 16:03 jimfulton

I implemented faceted search using GIN/ts_rank_cd:

SELECT * from (
  SELECT
    content_type, docid,
    row_number() over (
       PARTITION BY content_type
       ORDER BY ts_rank_cd(text_vector, to_tsquery('english', 'open')) desc
       ) AS rank,
    count(*) over (partition by content_type) AS count
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
        and content_type in (
            'IProfile', 'IWikiPage', 'IBlogEntry',
            'ICommunityFile', 'ICalendarEvent', 'ICommunity')
) _ where rank <= 9;
  content_type  |    docid    | rank | count 
----------------+-------------+------+-------
 IBlogEntry     |  -330268600 |    1 | 10892
 IBlogEntry     |  1152687712 |    2 | 10892
 IBlogEntry     |  -204982172 |    3 | 10892
 IBlogEntry     |  -682269095 |    4 | 10892
 IBlogEntry     | -1911826874 |    5 | 10892
 IBlogEntry     | -1601356406 |    6 | 10892
 IBlogEntry     |   209135101 |    7 | 10892
 IBlogEntry     |  -933259368 |    8 | 10892
 IBlogEntry     | -1601356623 |    9 | 10892
 ICalendarEvent |   624891960 |    1 |  1436
 ICalendarEvent | -1704100826 |    2 |  1436
 ICalendarEvent |  -527183204 |    3 |  1436
 ICalendarEvent |    70590147 |    4 |  1436
 ICalendarEvent |   935338379 |    5 |  1436
 ICalendarEvent | -1575005992 |    6 |  1436
 ICalendarEvent |  -402217878 |    7 |  1436
 ICalendarEvent | -1588859439 |    8 |  1436
 ICalendarEvent | -1507222389 |    9 |  1436
 ICommunity     |  -592387796 |    1 |   217
 ICommunity     |  -453618236 |    2 |   217
 ICommunity     |   312046691 |    3 |   217
 ICommunity     |    60374011 |    4 |   217
 ICommunity     | -1973547444 |    5 |   217
 ICommunity     |   -20380877 |    6 |   217
 ICommunity     | -1601367402 |    7 |   217
 ICommunity     |  1162486145 |    8 |   217
 ICommunity     |  1595297687 |    9 |   217
 ICommunityFile |  1457676763 |    1 | 33314
 ICommunityFile |  1655334035 |    2 | 33314
 ICommunityFile |   331884856 |    3 | 33314
 ICommunityFile | -1601380686 |    4 | 33314
 ICommunityFile | -1049132683 |    5 | 33314
 ICommunityFile |   357809427 |    6 | 33314
 ICommunityFile |   780731041 |    7 | 33314
 ICommunityFile | -1636795297 |    8 | 33314
 ICommunityFile | -1601369000 |    9 | 33314
 IProfile       | -1668560925 |    1 |  3720
 IProfile       |  -727121925 |    2 |  3720
 IProfile       | -1668560866 |    3 |  3720
 IProfile       |  2129650389 |    4 |  3720
 IProfile       | -1668561094 |    5 |  3720
 IProfile       |   140750919 |    6 |  3720
 IProfile       | -1668561080 |    7 |  3720
 IProfile       | -1668560296 |    8 |  3720
 IProfile       | -1668558553 |    9 |  3720
 IWikiPage      |  1362276784 |    1 |  1796
 IWikiPage      | -1790139985 |    2 |  1796
 IWikiPage      |  -893346721 |    3 |  1796
 IWikiPage      |   685082132 |    4 |  1796
 IWikiPage      | -1374614493 |    5 |  1796
 IWikiPage      | -1601372798 |    6 |  1796
 IWikiPage      |  -682970390 |    7 |  1796
 IWikiPage      | -1601373240 |    8 |  1796
 IWikiPage      | -1601354180 |    9 |  1796
(54 rows)

Time: 87525.169 ms
EXPLAIN ANALYZE
SELECT * from (
  SELECT
    content_type, docid,
    row_number() over (
       PARTITION BY content_type
       ORDER BY ts_rank_cd(text_vector, to_tsquery('english', 'open')) desc
       ) AS rank,
    count(*) over (partition by content_type) AS count
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
        and content_type in (
            'IProfile', 'IWikiPage', 'IBlogEntry',
            'ICommunityFile', 'ICalendarEvent', 'ICommunity')
) _ where rank <= 9;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _  (cost=16293.21..17847.31 rows=9867 width=34) (actual time=682.571..732.635 rows=54 loops=1)
   Filter: (_.rank <= 9)
   Rows Removed by Filter: 51321
   ->  WindowAgg  (cost=16293.21..17477.29 rows=29602 width=38) (actual time=682.533..727.464 rows=51375 loops=1)
         ->  WindowAgg  (cost=16293.21..16959.25 rows=29602 width=30) (actual time=672.554..701.546 rows=51375 loops=1)
               ->  Sort  (cost=16293.21..16367.21 rows=29602 width=22) (actual time=672.546..680.882 rows=51375 loops=1)
                     Sort Key: pgtextindex.content_type, (ts_rank_cd(pgtextindex.text_vector, '''open'''::tsquery)) DESC
                     Sort Method: external merge  Disk: 1640kB
                     ->  Bitmap Heap Scan on pgtextindex  (cost=540.77..14094.75 rows=29602 width=22) (actual time=12.483..551.744 rows=51375 loops=1)
                           Recheck Cond: (text_vector @@ '''open'''::tsquery)
                           Filter: ((content_type)::text = ANY ('{IProfile,IWikiPage,IBlogEntry,ICommunityFile,ICalendarEvent,ICommunity}'::text[]))
                           Rows Removed by Filter: 4745
                           Heap Blocks: exact=11070
                           ->  Bitmap Index Scan on pgtextindex_text_vector_idx  (cost=0.00..533.37 rows=57249 width=0) (actual time=9.811..9.811 rows=56120 loops=1)
                                 Index Cond: (text_vector @@ '''open'''::tsquery)
 Planning time: 0.364 ms
 Execution time: 735.938 ms
(17 rows)

Time: 739.350 ms

And using a RUM/<=> index:

SELECT * from (
  SELECT
    content_type, docid,
    row_number() over (
       PARTITION BY content_type
       ORDER BY text_vector <=> to_tsquery('english', 'open')
       ) AS rank,
    count(*) over (partition by content_type) AS count
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
        and content_type in (
            'IProfile', 'IWikiPage', 'IBlogEntry',
            'ICommunityFile', 'ICalendarEvent', 'ICommunity')
) _ where rank <= 9;
  content_type  |    docid    | rank | count 
----------------+-------------+------+-------
 IBlogEntry     | -1601357143 |    1 | 10892
 IBlogEntry     | -1601356339 |    2 | 10892
 IBlogEntry     | -1601357091 |    3 | 10892
 IBlogEntry     | -1601356296 |    4 | 10892
 IBlogEntry     | -1601355907 |    5 | 10892
 IBlogEntry     | -1601351712 |    6 | 10892
 IBlogEntry     |  1173726272 |    7 | 10892
 IBlogEntry     |  -712090370 |    8 | 10892
 IBlogEntry     |  1173726269 |    9 | 10892
 ICalendarEvent | -1363098629 |    1 |  1436
 ICalendarEvent |   126208660 |    2 |  1436
 ICalendarEvent |  1728174747 |    3 |  1436
 ICalendarEvent |   401783157 |    4 |  1436
 ICalendarEvent |  1570612111 |    5 |  1436
 ICalendarEvent | -1313617880 |    6 |  1436
 ICalendarEvent |  1485762557 |    7 |  1436
 ICalendarEvent |   278342644 |    8 |  1436
 ICalendarEvent |  1740564821 |    9 |  1436
 ICommunity     | -1601398070 |    1 |   217
 ICommunity     |  1595297687 |    2 |   217
 ICommunity     | -1011365159 |    3 |   217
 ICommunity     |  1649656556 |    4 |   217
 ICommunity     |   288094250 |    5 |   217
 ICommunity     |   685024545 |    6 |   217
 ICommunity     |  -541452374 |    7 |   217
 ICommunity     |  1286721489 |    8 |   217
 ICommunity     |  -116807627 |    9 |   217
 ICommunityFile |  1798728473 |    1 | 33314
 ICommunityFile | -2087588118 |    2 | 33314
 ICommunityFile | -1592319368 |    3 | 33314
 ICommunityFile |  1735228498 |    4 | 33314
 ICommunityFile | -1196503039 |    5 | 33314
 ICommunityFile | -1601368998 |    6 | 33314
 ICommunityFile | -1825927329 |    7 | 33314
 ICommunityFile |  1721868489 |    8 | 33314
 ICommunityFile |   312927591 |    9 | 33314
 IProfile       | -1668561094 |    1 |  3720
 IProfile       | -1668560925 |    2 |  3720
 IProfile       | -1668560866 |    3 |  3720
 IProfile       |  -727121925 |    4 |  3720
 IProfile       |  2129650389 |    5 |  3720
 IProfile       |   140750919 |    6 |  3720
 IProfile       | -1668561080 |    7 |  3720
 IProfile       | -1668560296 |    8 |  3720
 IProfile       | -1894566324 |    9 |  3720
 IWikiPage      |  -417397354 |    1 |  1796
 IWikiPage      |  1467745286 |    2 |  1796
 IWikiPage      | -1601348544 |    3 |  1796
 IWikiPage      |  -545076045 |    4 |  1796
 IWikiPage      |   605309915 |    5 |  1796
 IWikiPage      | -1601361557 |    6 |  1796
 IWikiPage      | -1601369485 |    7 |  1796
 IWikiPage      | -1601348535 |    8 |  1796
 IWikiPage      |  -636679312 |    9 |  1796
(54 rows)

Time: 87618.232 ms
EXPLAIN ANALYZE
SELECT * from (
  SELECT
    content_type, docid, text_vector <=> to_tsquery('english', 'open'),
    row_number() over (
       PARTITION BY content_type
       ORDER BY text_vector <=> to_tsquery('english', 'open')
       ) AS rank,
    count(*) over (partition by content_type) AS count
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
        and content_type in (
            'IProfile', 'IWikiPage', 'IBlogEntry',
            'ICommunityFile', 'ICalendarEvent', 'ICommunity')
) _ where rank <= 9;
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _  (cost=16194.17..17704.18 rows=9587 width=38) (actual time=630.790..673.851 rows=54 loops=1)
   Filter: (_.rank <= 9)
   Rows Removed by Filter: 51321
   ->  WindowAgg  (cost=16194.17..17344.65 rows=28762 width=38) (actual time=630.758..669.428 rows=51375 loops=1)
         ->  WindowAgg  (cost=16194.17..16841.32 rows=28762 width=30) (actual time=622.302..647.553 rows=51375 loops=1)
               ->  Sort  (cost=16194.17..16266.08 rows=28762 width=22) (actual time=622.293..629.306 rows=51375 loops=1)
                     Sort Key: pgtextindex.content_type, ((pgtextindex.text_vector <=> '''open'''::tsquery))
                     Sort Method: external merge  Disk: 1640kB
                     ->  Bitmap Heap Scan on pgtextindex  (cost=535.73..14064.07 rows=28762 width=22) (actual time=15.842..508.118 rows=51375 loops=1)
                           Recheck Cond: (text_vector @@ '''open'''::tsquery)
                           Filter: ((content_type)::text = ANY ('{IProfile,IWikiPage,IBlogEntry,ICommunityFile,ICalendarEvent,ICommunity}'::text[]))
                           Rows Removed by Filter: 4745
                           Heap Blocks: exact=11070
                           ->  Bitmap Index Scan on pgtextindex_rum_text_vector_idx  (cost=0.00..528.54 rows=56072 width=0) (actual time=13.679..13.679 rows=58793 loops=1)
                                 Index Cond: (text_vector @@ '''open'''::tsquery)
 Planning time: 0.492 ms
 Execution time: 676.957 ms
(17 rows)

Time: 680.469 ms

Before each run, I dropped disk cache and restart postgres:

do echo 3 | sudo tee /proc/sys/vm/drop_caches
sudo systemctl restart postgresql.service

To get cold and warm (reported by explain analyze) times. I ran 4 times for each and collected best-of times:

         Times in seconds:
Method   cold time  warm time
======   ========   =========
GIN            87        .640
RUM            86        .560
======   ========   =========

RUM beats GIN a little cold and a bit more, 12%, hot.

Note that the GIN and RUM query plans are largely the same. In particularm the RUM index isn't used to sort.

jimfulton avatar Mar 03 '17 18:03 jimfulton

Hi! Could you try the following query? I hope I didn't make any typo here.

EXPLAIN ANALYZE
with items as (
  SELECT
	content_type, docid,
	row_number() over (
	   PARTITION BY content_type
	   ORDER BY text_vector <=> to_tsquery('english', 'open')
	   ) AS rank
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
		and content_type in (
			'IProfile', 'IWikiPage', 'IBlogEntry',
			'ICommunityFile', 'ICalendarEvent', 'ICommunity')
), counters as (
	select 
		content_type,
		count(1) as count
	from items
		group by content_type
),items_limited as (
	SELECT * from items where rank <= 9
)
SELECT 
	(SELECT json_agg(counters)) as counters,
	(SELECT json_agg(items_limited)) as items_limited

ArturFormella avatar Mar 03 '17 21:03 ArturFormella

Nope. That didn't compile. Unknown column counters. Maybe you meant:

EXPLAIN ANALYZE
with items as (
  SELECT
  content_type, docid,
  row_number() over (
     PARTITION BY content_type
     ORDER BY text_vector <=> to_tsquery('english', 'open')
     ) AS rank
  FROM pgtextindex
  WHERE text_vector @@ to_tsquery('english', 'open')
    and content_type in (
      'IProfile', 'IWikiPage', 'IBlogEntry',
      'ICommunityFile', 'ICalendarEvent', 'ICommunity')
), counters as (
  select 
    content_type,
    count(1) as count
  from items
    group by content_type
),items_limited as (
  SELECT * from items where rank <= 9
)
SELECT json_agg(array(select row(counters.*) from counters))
           as counters,
       json_agg(array(select row(items_limited.*) from items_limited))
           as items_limited;
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=18568.71..18568.72 rows=1 width=64) (actual time=683.830..683.830 rows=1 loops=1)
   CTE items
     ->  WindowAgg  (cost=16293.21..16959.25 rows=29602 width=30) (actual time=602.696..634.523 rows=51375 loops=1)
           ->  Sort  (cost=16293.21..16367.21 rows=29602 width=22) (actual time=602.688..610.890 rows=51375 loops=1)
                 Sort Key: pgtextindex.content_type, ((pgtextindex.text_vector <=> '''open'''::tsquery))
                 Sort Method: external merge  Disk: 1640kB
                 ->  Bitmap Heap Scan on pgtextindex  (cost=540.77..14094.75 rows=29602 width=22) (actual time=11.964..489.411 rows=51375 loops=1)
                       Recheck Cond: (text_vector @@ '''open'''::tsquery)
                       Filter: ((content_type)::text = ANY ('{IProfile,IWikiPage,IBlogEntry,ICommunityFile,ICalendarEvent,ICommunity}'::text[]))
                       Rows Removed by Filter: 4745
                       Heap Blocks: exact=11070
                       ->  Bitmap Index Scan on pgtextindex_text_vector_idx  (cost=0.00..533.37 rows=57249 width=0) (actual time=9.942..9.942 rows=56120 loops=1)
                             Index Cond: (text_vector @@ '''open'''::tsquery)
   CTE counters
     ->  HashAggregate  (cost=740.05..742.05 rows=200 width=86) (actual time=23.862..23.864 rows=6 loops=1)
           Group Key: items.content_type
           ->  CTE Scan on items  (cost=0.00..592.04 rows=29602 width=78) (actual time=0.012..10.125 rows=51375 loops=1)
   CTE items_limited
     ->  CTE Scan on items items_1  (cost=0.00..666.04 rows=9867 width=90) (actual time=602.712..659.732 rows=54 loops=1)
           Filter: (rank <= 9)
           Rows Removed by Filter: 51321
   InitPlan 4 (returns $3)
     ->  CTE Scan on counters  (cost=0.00..4.00 rows=200 width=32) (actual time=23.867..23.872 rows=6 loops=1)
   InitPlan 5 (returns $4)
     ->  CTE Scan on items_limited  (cost=0.00..197.34 rows=9867 width=32) (actual time=602.717..659.777 rows=54 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
 Planning time: 0.576 ms
 Execution time: 689.271 ms
(28 rows)

Time: 694.514 ms

Sorry for the slow reply. Thanks.

jimfulton avatar Mar 06 '17 16:03 jimfulton

Hello, @jimfulton can you test the fix? I have fixed the bug related with prefix search:

Should a prefix search (like text_vector <=> to_tsquery('english', 'euro:*')) work?

za-arthur avatar Mar 20 '17 13:03 za-arthur

Sure. I'll test it soon.

jimfulton avatar Mar 20 '17 18:03 jimfulton