couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

use text prefix in regex to speed up query

Open rnewson opened this issue 2 years ago • 3 comments

Overview

for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

Testing recommendations

TBD

Related Issues or Pull Requests

https://github.com/apache/couchdb/issues/4775

Checklist

  • [x] Code is written and works correctly
  • [ ] Changes are covered by tests
  • [ ] Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • [ ] Documentation changes were made in the src/docs folder
  • [ ] Documentation changes were backported (separated PR) to affected branches

rnewson avatar Sep 26 '23 11:09 rnewson

I've not done the text side yet, or any tests. just sounding out the idea.

for text I'd prefer to pass the regex through to Lucene (clouseau or nouveau) and document the variation in regex flavour (there's huge overlap). Omitting the optimizations Lucene makes for the sake of purity was a mistake in the original implementation imo.

rnewson avatar Sep 26 '23 11:09 rnewson

I wonder whether a $startsWith operator would be cleaner, as we could then optimize it for text indexes specifically? The $regex operator originally did differ for text indexes iirc but we had users experience weirdness when they would add an index and suddenly get different results.

A general principal in Mango over the last ~5 years is that adding an index shouldn't change the result of a query implicitly, so I'd be wary of reintroducing that behaviour.

willholley avatar Sep 26 '23 11:09 willholley

I like that. We could then convert the prefix of the regex to a startswith for views.

rnewson avatar Sep 26 '23 12:09 rnewson