Michael McCandless

Results 44 issues of Michael McCandless

I have largely trusted that enabling JFR does not harm performance much, as long as you don't configure overly aggressive sampling. But I've seen internal evidence (Amazon product search, closed...

Since the nightly benchmarks prints detailed profiler results to the daily log file, we should not also print to console output.

Nightly charts now zoom in by default to past 1 or 2 years, but it is not clear how we can then zoom out to see the full history.

The `enwiki` files we use for `luceneutil` benchmarks, including nightly benchmarks, are very very old by now, almost a decade: `/l/data/enwiki-20110115-lines-1k-fixed.bin`. Also, how these files were created is not exactly...

https://issues.apache.org/jira/browse/LUCENE-9478 introduced an accidentally memory leak into 8.6.0 ... The leak affects full flush, so our nightly NRT benchmark should've seen massive jump in heap usage.

When we (Amazon product search) upgraded to Lucene 8.5.1, which includes [newly added block compression for BINARY doc values](https://issues.apache.org/jira/browse/LUCENE-9211), we saw a sizable (~30%) reduction in our red-line QPS (throughput)....

We use this `StatisticsHelper.java` class to watch JIT/GC metrics and print summary to console output at the end of each Indexing and Searching run, but today we [only produce a...

Thanks to @stefanvodita upstream Lucene improvement, we now have the notion of a stale PR. Maybe we should make this facetable in https://githubsearch.mikemccandless.com?

GitHubSearch

[Spinoff from https://github.com/apache/lucene/pull/12829#issuecomment-1855755782] I'm curious what overhead we pay calling `addDocument` for N documents, versus indexing all N docs in a single `addDocuments` call. IW has non-trivial entry / exit...

We currently test float KNN vector search in the nightly benchy, but not byte vectors.