shimmie2
shimmie2 copied to clipboard
Autocomplete with categories
Auto-complete with categories would be nice, but I had to disable it in 2935db9d because it was noticeably slow (overall p95 went from 80ms to 150ms) even though I don't have the Categories extension enabled.
Could we only do this extra search if Categories is enabled?
Could we add some kind of index that makes "%:{term}%" efficient?
The performance difference is much less drastic on my system, only about a 10ms difference. Do you have the tags_lower_tag_idx index that was committed awhile back?
I tried a couple of methods, like
SELECT tag, count
FROM tags
WHERE lower(tag) ~ '^(.+:)?da.*'
AND count > 0
ORDER BY count DESC
and
SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
OR substring(lower(tag),position(':' IN tag)+1) LIKE lower('da%')
AND count > 0
ORDER BY count DESC
with some indexes but wasn't able to improve on the performance of:
SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
OR lower(tag) LIKE lower('%:da%')
AND count > 0
ORDER BY count DESC
The first two consistently added 20+ms vs the third query in my data.
Frankly, doing two LIKES is always going to be costlier than one, especially if one uses a wildcard at the beginning. An option within the category extension to turn it on and off would probably be ideal.
But this is if we have to stick to this "starts with" search pattern. This query finished in less than 5ms on my data:
SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag) @@ to_tsquery('english', 'da:*')
AND count > 0
ORDER BY count DESC
It's using postgres full-text search, with this index:
create index tags_test ON tags using gin(to_tsvector('english', tag))
This returns these tags:
- Series:Danger_Girl
- set:ETERNAL_DARKNESS
- darkstalkers
- tomb_raider_angel_of_darkness
- princess_daisy
It's not strictly "starts with" like current behavior, but it does limit it to just words that start with the search pattern.
Tried this:
SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag) @@ to_tsquery('english', 'da:*')
AND (lower(tag) LIKE lower('da%')
OR lower(tag) LIKE lower('%:da%'))
AND count > 0
ORDER BY count DESC
Produces the starts-with results that is current behavior, but still finishes in less than 5 ms.
I think that it's likely this will be data-dependant, but still. That's fast.
Found a downside to the full text search, it doesn't return results for certain inputs, presumably because the search term isn't specific enough for some internal criteria it has. For instance, I can't search for s:* or a:, but d: works fine. Another letter seems to work fine. Might be a way around somehow, but it might also be a good idea to let it limit like that.
I did some tests and it does marginally improve post searching with wildcard tags, but by less than 100 ms. Not much in the grand scheme of a wildcard query. Let me know what you think about this.