shimmie2 Autocomplete with categories

Auto-complete with categories would be nice, but I had to disable it in 2935db9d because it was noticeably slow (overall p95 went from 80ms to 150ms) even though I don't have the Categories extension enabled.

Could we only do this extra search if Categories is enabled?

Could we add some kind of index that makes "%:{term}%" efficient?

Jul 06 '19 10:07 shish

The performance difference is much less drastic on my system, only about a 10ms difference. Do you have the tags_lower_tag_idx index that was committed awhile back?

Jul 06 '19 12:07 sanmadjack

I tried a couple of methods, like

   SELECT tag, count
    FROM tags
    WHERE lower(tag) ~ '^(.+:)?da.*'
      AND count > 0
    ORDER BY count DESC

and

SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
   OR substring(lower(tag),position(':' IN tag)+1) LIKE lower('da%')
    AND count > 0
ORDER BY count DESC

with some indexes but wasn't able to improve on the performance of:

SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
   OR lower(tag) LIKE lower('%:da%')
  AND count > 0
ORDER BY count DESC

The first two consistently added 20+ms vs the third query in my data.

Frankly, doing two LIKES is always going to be costlier than one, especially if one uses a wildcard at the beginning. An option within the category extension to turn it on and off would probably be ideal.

But this is if we have to stick to this "starts with" search pattern. This query finished in less than 5ms on my data:

SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag)  @@ to_tsquery('english', 'da:*')
  AND count > 0
ORDER BY count DESC

It's using postgres full-text search, with this index:

create index tags_test ON tags using gin(to_tsvector('english', tag))

This returns these tags:

Series:Danger_Girl
set:ETERNAL_DARKNESS
darkstalkers
tomb_raider_angel_of_darkness
princess_daisy

It's not strictly "starts with" like current behavior, but it does limit it to just words that start with the search pattern.

Jul 06 '19 13:07 sanmadjack

Tried this:

SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag)  @@ to_tsquery('english', 'da:*')
  AND (lower(tag) LIKE lower('da%')
    OR lower(tag) LIKE lower('%:da%'))
  AND count > 0
ORDER BY count DESC

Produces the starts-with results that is current behavior, but still finishes in less than 5 ms.

I think that it's likely this will be data-dependant, but still. That's fast.

Jul 06 '19 13:07 sanmadjack

Found a downside to the full text search, it doesn't return results for certain inputs, presumably because the search term isn't specific enough for some internal criteria it has. For instance, I can't search for s:* or a:, but d: works fine. Another letter seems to work fine. Might be a way around somehow, but it might also be a good idea to let it limit like that.

I did some tests and it does marginally improve post searching with wildcard tags, but by less than 100 ms. Not much in the grand scheme of a wildcard query. Let me know what you think about this.

Jul 08 '19 16:07 sanmadjack

shimmie2 shimmie2 copied to clipboard

Autocomplete with categories

shimmie2
shimmie2 copied to clipboard