paradedb icon indicating copy to clipboard operation
paradedb copied to clipboard

Query joining search index and foreign table is very slow

Open pdpark opened this issue 6 months ago • 1 comments

What happens?

I'm getting wildly varying query times on the first query after creating my tables and search index. About 50% of the time it runs in 300-500ms. The other ~50% of the time it takes almost two minutes (1:45-1:50)! After the query runs slow once, the next query runs fast. In between runs I stop & remove the container, remove the images, and prune all volumes.

To Reproduce

Query:

with search_score as (
    select * from my_schema.search_idx.score_bm25(
    '
    (
             x:"green"
         OR  y:"green"
         OR  z:"green"
         OR  m:"green"
         OR  n:"green"
         OR  p:"green"
         OR  q:"green"
         OR  r:"green"
    )
    AND a:"456"
    '
    )
)
select
        x.cold,
        x.cole,
        x.colf,
        x.colg,
        x.colh,
        p.colx
from search_score as s
join my_schema.table1 as x
on s.id = x.id
join my_schema.table2 as p
on p.cola = x.cola
and p.colb = x.colb
and (
    p.colc = '123'
    or p.colc is null
)
order by score_bm25 desc
limit 250
;

OS:

Amazon Linux 2023 / PostgreSQL 16.4 (Debian 16.4-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

ParadeDB Version:

pg_search 0.9.1 pg_analytics 0.1.0

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Docker Image

Full Name:

Patrick Park

Affiliation:

Payzer

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include the code required to reproduce the issue?

  • [X] Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • [X] Yes, I have

pdpark avatar Aug 24 '24 00:08 pdpark