Michael McCandless issues

Results 44 issues of


                                            Michael McCandless

Confirm new JFR profiling is adding minimal overhead

I have largely trusted that enabling JFR does not harm performance much, as long as you don't configure overly aggressive sampling. But I've seen internal evidence (Amazon product search, closed...

nightlyBench.py should not print profiler results to console

Since the nightly benchmarks prints detailed profiler results to the daily log file, we should not also print to console output.

Cannot zoom nightly charts out to full history?

Nightly charts now zoom in by default to past 1 or 2 years, but it is not clear how we can then zoom out to see the full history.

Upgrade to latest en-wiki export

The `enwiki` files we use for `luceneutil` benchmarks, including nightly benchmarks, are very very old by now, almost a decade: `/l/data/enwiki-20110115-lines-1k-fixed.bin`. Also, how these files were created is not exactly...

Why didn't Lucene nighty benchmarks catch new DPWT memory leak?

https://issues.apache.org/jira/browse/LUCENE-9478 introduced an accidentally memory leak into 8.6.0 ... The leak affects full flush, so our nightly NRT benchmark should've seen massive jump in heap usage.

Add benchmark covering BINARY doc values query-time performance

When we (Amazon product search) upgraded to Lucene 8.5.1, which includes [newly added block compression for BINARY doc values](https://issues.apache.org/jira/browse/LUCENE-9211), we saw a sizable (~30%) reduction in our red-line QPS (throughput)....

Also chart JIT/GC metrics during searching

We use this `StatisticsHelper.java` class to watch JIT/GC metrics and print summary to console output at the end of each Indexing and Searching run, but today we [only produce a...

Maybe add "stale" as a facet field?

Thanks to @stefanvodita upstream Lucene improvement, we now have the notion of a stale PR. Maybe we should make this facetable in https://githubsearch.mikemccandless.com?

GitHubSearch

Test performance impact of `addDocuments` vs `addDocument`

[Spinoff from https://github.com/apache/lucene/pull/12829#issuecomment-1855755782] I'm curious what overhead we pay calling `addDocument` for N documents, versus indexing all N docs in a single `addDocuments` call. IW has non-trivial entry / exit...

Add `KnnByteVectorQuery` nightly benchy tasks?

We currently test float KNN vector search in the nightly benchy, but not byte vectors.