Michael McCandless
Michael McCandless
The thing that caused me to dig into this particular issue was watching the `java` process running `SearchPerfTest` with six worker threads spin at 100% CPU for quite a while...
Hmm OK I tried reverting that change, so we no longer group by task category: ``` diff --git a/src/main/perf/SearchPerfTest.java b/src/main/perf/SearchPerfTest.java index e77ceec..9983092 100755 --- a/src/main/perf/SearchPerfTest.java +++ b/src/main/perf/SearchPerfTest.java @@ -560,7 +560,7...
> So this grouping change ensures that we run only tasks of the same type at the same time, enabling us to attribute the wall clock times in the logs...
OK I had wondered why the nightly benchmarks didn't show the (trappy) "effective QPS" slowdown when we enabled `searchConcurrency=1` and it turns out ... it did! (It just had not...
@rmuir also suggested using `fincore --output-all /path/to/index/*` to monitor how many hot/cold pages we see in the index while/after benchmarking.
> _parent_ ;) !!
+1, that's a great idea. The nightly benchy currently does not run `knnPerfTest.py` but rather the `VectorSearch` tasks (`KnnFloatVectorQuery`). So we could either try to add recall to `SearchTask.java` where...
> Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks? +1, that's a nice approach. Though even Lucene's `count()` API has some...
> > Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too? > > I'm not sure what you mean...
Thanks @rmuir and @ChrisHegarty. I've downloaded all my content from `home.apache.org` (Lucene benchmark source corpora, line file docs, large vector file, etc.), so we won't lose any benchy stuff once...