use text prefix in regex to speed up query
Overview
for selector;
{"selector":{"_id":{"$regex":"doc.+"}}}
before;
{
"include_docs": true,
"view_type": "map",
"reduce": false,
"partition": null,
"start_key": [],
"end_key": [
"<MAX>"
],
"direction": "fwd",
"stable": false,
"update": true,
"conflicts": "undefined"
}
after;
{
"include_docs": true,
"view_type": "map",
"reduce": false,
"partition": null,
"start_key": [
"doc"
],
"end_key": [
"doc�",
"<MAX>"
],
"direction": "fwd",
"stable": false,
"update": true,
"conflicts": "undefined"
}
Testing recommendations
TBD
Related Issues or Pull Requests
https://github.com/apache/couchdb/issues/4775
Checklist
- [x] Code is written and works correctly
- [ ] Changes are covered by tests
- [ ] Any new configurable parameters are documented in
rel/overlay/etc/default.ini - [ ] Documentation changes were made in the
src/docsfolder - [ ] Documentation changes were backported (separated PR) to affected branches
I've not done the text side yet, or any tests. just sounding out the idea.
for text I'd prefer to pass the regex through to Lucene (clouseau or nouveau) and document the variation in regex flavour (there's huge overlap). Omitting the optimizations Lucene makes for the sake of purity was a mistake in the original implementation imo.
I wonder whether a $startsWith operator would be cleaner, as we could then optimize it for text indexes specifically? The $regex operator originally did differ for text indexes iirc but we had users experience weirdness when they would add an index and suddenly get different results.
A general principal in Mango over the last ~5 years is that adding an index shouldn't change the result of a query implicitly, so I'd be wary of reintroducing that behaviour.
I like that. We could then convert the prefix of the regex to a startswith for views.