Add `alwaysStopwords` option to `edismax` so its "all stopwords" behaviour can be controlled
work in progress! If anyone sees this and has thoughts on it, please comment below :-)
Description
We were surprised by edismax's behaviour for pure-stopword queries, having expected that these would return zero results. Its 'If a query consists of all stopwords, such as "to be or not to be", then all words are required.' behaviour is the opposite to what we want as we're using query-time stopwords to prevent particular query terms matching, but there's no way to disable the behaviour other than using dismax instead, which may have other impacts.
(Using index-time stopwords breaks with mm=100% - users can't include the stop words in queries.)
Solution
This PR adds an alwaysStopwords option that disables the default behaviour. Its name is TBD.
I've noticed that the query plan becomes +() rather than MatchNoDocsQuery("") for pure-stopword queries, also when the query contains only tokenising characters (e.g. punctuation). Does this make any difference?
Tests
TO DO! - I'll also make documentation updates for edismax and the stop word filter
Checklist
Please review the following and check all that apply:
- [ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request title.
- [ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
- [ ] I have developed this patch against the
mainbranch. - [ ] I have run
./gradlew check. - [ ] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide