Michael McCandless comments

Results 350 comments of


                                            Michael McCandless

Concurrency bug in UnparsedTask?

Egads `.toString()` through an `AtomicReference`!

Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization

OK! I did the first step, which is to run `vmstat` during each indexing run and save away the resulting log. Yesterday's run produced these exciting vmstat logs: ``` beast3:tmp$...

Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization

Does this work? [vmstat-logs.zip](https://github.com/user-attachments/files/17311809/vmstat-logs.zip) This is all the vmstat logs, each from building the different N indices that nightly benchy builds ...

Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization

Whoa, thanks @rmuir! Nice charts! Yeah, this is beast3 with 128 cores, and one of those indexing runs (the `fixedIndex` one) uses only one core (mostly) so that it generates...

Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization

Last night's benchy made the nice gnuplot vmstat charts! It's linked off the [nightly details page](https://benchmarks.mikemccandless.com/2024.10.10.18.04.33.html), to [here](https://benchmarks.mikemccandless.com/2024.10.10.18.04.33/index.html). Each sub-directory has the charts for building that one index, e.g. [`fastIndexMediumDocs`](https://benchmarks.mikemccandless.com/2024.10.10.18.04.33/fastIndexMediumDocs/)....

Instrument IndexOrDocValuesQuery to report on its decisions

+1 to keeping `Query` classes lean. A general framework on `IndexSearcher` sounds nice, but it's hard to generalize with just this one use case? Can we think of other queries/collectors...

Make dynamic range facets value collection and sorting faster

Learned Sort looks amazing -- @josefschiefer27 maybe open a dedicated spinoff issue to see if there are other places where it could help Lucene? Lucene does a lot of sorting...

JFR profiling does not report heap usage

Hmm that's odd -- this is where we ask `java` to enable JFR recording: https://github.com/mikemccand/luceneutil/blob/main/src/python/benchUtil.py#L1185 And we use these settings: https://github.com/mikemccand/luceneutil/blob/main/src/python/profiling.jfc which seems to be enabling object allocation tracking. Maybe...

Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree

> @jpountz IMO We should use `Bit21With3StepsEncoder` in DocIdsWriter as using `Bit21With2StepsEncoder` might lead to performance regression for workloads in aarch64 platforms. +1 -- this seems the safer choice, today...

Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree

> We can replace it with Bit21With2StepsEncoder in future when the performance is comparable to x86. I wonder what mechanism we could use to remind ourselves when performance of `aarch64`...