Tim Allison
Tim Allison
Any updates on this? This is a blocker on https://issues.apache.org/jira/browse/NUTCH-2994. Let me know if I can help.
We're adding wacz detection (maybe parsing?) over on Apache Tika now. As a temporary placeholder at least, is `application/wacz` appropriate ? https://issues.apache.org/jira/browse/TIKA-3696
Exciting! Ugh. Should probably create a MatchAllDocsQuery for that like we do for `*:*` if we're not?
Y, this is Solr's behavior: ``` // called from parser protected Query getWildcardQuery(String field, String termStr) throws SyntaxError { checkNullField(field); // *:* -> MatchAllDocsQuery if ("*".equals(termStr)) { if ("*".equals(field) ||...
Can you do me a favor and see if the ComplexPhraseQueryParser dies on "foo *"? I'm happy enough converting * to a MatchAllDocsQuery when it is outside of a SpanQuery,...
The other question is do we want to do this at the Lucene level or at the Solr level? My pref would be to do this at the Lucene level,...
@sjwoodard, I may have some time to work on this soon. Let me know if you still care.
If we fix it in Solr, how do these tests look: ``` public void testMatchAllDocs() throws Exception { assertJQ(req("defType", "span", "q", "*"), "/response/numFound==4"); assertJQ(req("defType", "span", "q", "*:*"), "/response/numFound==4"); assertJQ(req("df", "text0",...
My one concern is: ``` assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3"); ``` This does return the correct documents, but it returns the wildcard query: `text0:*`, which could still blow...
Sorry for my delay. The cooccurrence code, as you pointed out, is not optimized for performance. It does perform re-analysis. Even on corpora of a few million documents, Lucene is...