Inverted indices should be used to speedup string filters.
If a string column has a FTS index then we should have enough information to speed up a variety of string-based filters. Here is a (currently very partial as I don't know what's possible) listing:
- [ ] Equality queries
- [ ] Range queries?
- [ ] #3416
Is there any worry that tokenization could mess with this? I think in general it only makes it a wider net by:
- Lower casing
- Stemming (
running,run-> same token) - Ascii folding (
café,cafe-> same token) - Stop word removal -> fewer words to match on.
It should be fine, but worth being aware of these transformations.
Hmm, yeah, it would be a problem if contains('run', 'running') returned true. Maybe a specialized index then. A GIN index like label list could work.
Hmm, yeah, it would be a problem if
contains('run', 'running')returned true. Maybe a specialized index then. A GIN index like label list could work.
I was thinking you could still use the FTS index, but would have a "refine" step where you take the results and do the exact contains test after. Not optimal in all cases, but could potentially work.