Tim Allison

Results 93 comments of Tim Allison

Any updates on this? This is a blocker on https://issues.apache.org/jira/browse/NUTCH-2994. Let me know if I can help.

We're adding wacz detection (maybe parsing?) over on Apache Tika now. As a temporary placeholder at least, is `application/wacz` appropriate ? https://issues.apache.org/jira/browse/TIKA-3696

Exciting! Ugh. Should probably create a MatchAllDocsQuery for that like we do for `*:*` if we're not?

Y, this is Solr's behavior: ``` // called from parser protected Query getWildcardQuery(String field, String termStr) throws SyntaxError { checkNullField(field); // *:* -> MatchAllDocsQuery if ("*".equals(termStr)) { if ("*".equals(field) ||...

Can you do me a favor and see if the ComplexPhraseQueryParser dies on "foo *"? I'm happy enough converting * to a MatchAllDocsQuery when it is outside of a SpanQuery,...

The other question is do we want to do this at the Lucene level or at the Solr level? My pref would be to do this at the Lucene level,...

@sjwoodard, I may have some time to work on this soon. Let me know if you still care.

If we fix it in Solr, how do these tests look: ``` public void testMatchAllDocs() throws Exception { assertJQ(req("defType", "span", "q", "*"), "/response/numFound==4"); assertJQ(req("defType", "span", "q", "*:*"), "/response/numFound==4"); assertJQ(req("df", "text0",...

My one concern is: ``` assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3"); ``` This does return the correct documents, but it returns the wildcard query: `text0:*`, which could still blow...

Sorry for my delay. The cooccurrence code, as you pointed out, is not optimized for performance. It does perform re-analysis. Even on corpora of a few million documents, Lucene is...