Michael McCandless

Results 350 comments of Michael McCandless

The thing that caused me to dig into this particular issue was watching the `java` process running `SearchPerfTest` with six worker threads spin at 100% CPU for quite a while...

Hmm OK I tried reverting that change, so we no longer group by task category: ``` diff --git a/src/main/perf/SearchPerfTest.java b/src/main/perf/SearchPerfTest.java index e77ceec..9983092 100755 --- a/src/main/perf/SearchPerfTest.java +++ b/src/main/perf/SearchPerfTest.java @@ -560,7 +560,7...

> So this grouping change ensures that we run only tasks of the same type at the same time, enabling us to attribute the wall clock times in the logs...

OK I had wondered why the nightly benchmarks didn't show the (trappy) "effective QPS" slowdown when we enabled `searchConcurrency=1` and it turns out ... it did! (It just had not...

@rmuir also suggested using `fincore --output-all /path/to/index/*` to monitor how many hot/cold pages we see in the index while/after benchmarking.

+1, that's a great idea. The nightly benchy currently does not run `knnPerfTest.py` but rather the `VectorSearch` tasks (`KnnFloatVectorQuery`). So we could either try to add recall to `SearchTask.java` where...

> Could you use tasks where dynamic pruning doesn't apply instead of disabling it? E.g. use counting tasks? +1, that's a nice approach. Though even Lucene's `count()` API has some...

> > Indeed IndexSearcher#count has some optimizations to bypass postings. But it was mostly an example, some cheap faceting should work too? > > I'm not sure what you mean...

Thanks @rmuir and @ChrisHegarty. I've downloaded all my content from `home.apache.org` (Lucene benchmark source corpora, line file docs, large vector file, etc.), so we won't lose any benchy stuff once...