Michael McCandless comments

Results 350 comments of


                                            Michael McCandless

Add tasks for multiple negated keywords and its optimized version

> > I'm still not happy with the 2-3 K QPS :) Something seems amiss > > I suspect it's because the only scoring query produces a constant score. So...

Add tasks for multiple negated keywords and its optimized version

> > The benchmark writes detailed "results" files for each iteration -- can you peek at those and confirm your two forms of the same query are in fact getting...

Add tasks for multiple negated keywords and its optimized version

These gains indeed look correct (identical hit counts from X and XOpt tasks) and significant, especially for the N-term OR cases! Thanks @shubhamvishu. I thought we had a Lucene issue...

Add tasks for multiple negated keywords and its optimized version

Actually, I'm not sure how the `OrNegatedNTerms` rewrite case would work? It's a strange query e.g. `q=(-body:eric *:*) (-body:kansas *:*)`. I'm not sure this really happens in practice very often?...

Add tasks for multiple negated keywords and its optimized version

> Note : I tried writing count(-most -september *:*) task as count(-(most september) *:*) but that seem to do what we want and results in 0 results. so I had...

Maybe we should also search Lucene's sources?

I like that idea @arafalov -- that would give us a nice initial tokenization, and the deep metadata (class name, method name, a class being subclassed, etc.) could enable awesome...

Maybe we should also search Lucene's sources?

Looks like [the `com.sun.source.tree` package](https://docs.oracle.com/en/java/javase/21/docs/api/jdk.compiler/com/sun/source/tree/package-summary.html) has all the juicy stuff.

Close traps when testing concurrent search

It's the last `boolean` argument to `LocalTaskSource` ctor which groups all tasks by category together (running them all sequentially within each thread) when concurrency is enabled: https://github.com/mikemccand/luceneutil/commit/87a806341b008e959376ab0f1c8cfc0997a07d7a

Close traps when testing concurrent search

> Grouping was added here: #56 because without it the QPS measurement per task was meaningless OH! Thanks for digging @msokolov ... hmm I think this is why we have...

Close traps when testing concurrent search

> I guess we could switch from measuring wall clock time to measuring CPU time using [Java's JMX API](https://docs.oracle.com/cd/E17802_01/j2se/j2se/1.5.0/jcp/beta1/apidiffs/java/lang/management/ThreadMBean.html#getThreadCpuTime). If we did that we wouldn't need to group tasks this...