lucene-addons
lucene-addons copied to clipboard
Memory leak with wildcard inside double quotes
Running this query {!span}("*") causes a memory leak and a stop the world GC that can't be recovered from. It works fine on Solr's standard query parser.
I'll work on it today and follow up.
Exciting! Ugh. Should probably create a MatchAllDocsQuery for that like we do for *:*
if we're not?
It'll do it with an asterisk anywhere in quotes, so, "foo *" would crash it too. I know it's not really a good query, but it could still bring down a whole cluster. Do you think it should throw an exception?
update: "foo*" is OK, but a wildcard out by itself is not.
Y, this is Solr's behavior:
// called from parser
protected Query getWildcardQuery(String field, String termStr) throws SyntaxError {
checkNullField(field);
// *:* -> MatchAllDocsQuery
if ("*".equals(termStr)) {
if ("*".equals(field) || getExplicitField() == null) {
return newMatchAllDocsQuery();
}
}
Can you do me a favor and see if the ComplexPhraseQueryParser dies on "foo *"?
I'm happy enough converting * to a MatchAllDocsQuery when it is outside of a SpanQuery, but what should we do when inside a span? If in a SpanNear, would we just ignore it ("find foo within 2 words of anything" is the same thing as "find foo"). If in a SpanOr, should we convert that to a MatchAllDocsQuery?
The other question is do we want to do this at the Lucene level or at the Solr level? My pref would be to do this at the Lucene level, but that goes against the decision that was made in the actual Lucene/Solr project.
@sjwoodard, I may have some time to work on this soon. Let me know if you still care.
I think it's a good idea to fix it because Solr can't recovery from it. I still guard against it, but I didn't know how to fix it in the code.
If we fix it in Solr, how do these tests look:
public void testMatchAllDocs() throws Exception {
assertJQ(req("defType", "span", "q", "*"), "/response/numFound==4");
assertJQ(req("defType", "span", "q", "*:*"), "/response/numFound==4");
assertJQ(req("df", "text0", "defType", "span", "q", "*:*"), "/response/numFound==4");
assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3");
assertJQ(req("df", "text0", "defType", "span", "q", "NOT *"), "/response/numFound==1");
assertJQ(req("defType", "span", "q", "NOT *"), "/response/numFound==0");
assertQEx("need to have a field specified in schema",
req("defType", "span", "q", "nofield:*"),
SolrException.ErrorCode.BAD_REQUEST);
assertQEx("need to have a field specified in schema",
req("df", "nofield","defType", "span", "q", "*"),
SolrException.ErrorCode.BAD_REQUEST);
}
The documents in the test index are specified here: https://github.com/tballison/lucene-addons/blob/master/solr-5410/src/test/java/org/tallison/solr/search/TestSpanQParserPlugin.java#L47
My one concern is:
assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3");
This does return the correct documents, but it returns the wildcard query: text0:*
, which could still blow out your index...unless you turn off allowLeadingWildcards
Hi Tim,
Does this issue holds valid for the wildcard queries like following as well? I am using lucene-5205 on Solr-6.5.1. e.g
- fl:"mem* leak"
- fl:"[mem* leak] prob*"~3
The Solr which we are using is showing a constant rise in memory usage and the GC is very minimal and it ends up bringing down the shards.
Best, Modassar