Michael McCandless

Results 336 comments of Michael McCandless

This sounds super promising! There is [some discussion here about PQ and PCA](https://github.com/apache/lucene/issues/13403) as well.

> I would like to be able to remove/hide the index-time stats sometimes. We get so many columns the current output can't be displayed in most windows. I wonder if...

This was a spinoff from https://github.com/apache/lucene/pull/14078#issuecomment-2710460602

I like this idea! I hope we can find a simple enough API exposed through IWC to enable the optional grouping. This also has nice mechanical sympathy / symmetry with...

I like @jpountz's idea of just using separate `IndexWriter`s for this use-case, instead of adding custom routing logic to the separate DWPTs inside a single `IndexWriter` and then also needing...

> Each IndexWriter now generates its own sequence number. This would indeed get somewhat tricky. But is OpenSearch really using Lucene's returned sequence numbers? I had thought Elasticsearch's sequence number...

> I started adding support for ParentJoin benchmarks ([issue](https://github.com/mikemccand/luceneutil/issues/284)). Will raise it in multiple small PRs, here's the [first one](https://github.com/mikemccand/luceneutil/pull/283). Thank you for improving our benchy tooling @vigyasharma!

I think it's awesome to invest in our benchmarking tooling to be able to test different approaches for multi-valued vectors, but, I don't think that should be a blocker to...

We already have a baby amoeba step here (`mikes_tiny_vector_tool.py`) -- let's rename it to something better (`inspect_vectors.py`?), and add some more stats: * Probably user has to specify dimensionality since...

@shubhamvishu -- I think the `PKLookup` gains here are compelling, and there is consensus to make this improvement only to the `BloomFilterPostingsFormat`? But understanding why supposedly equivalent expressions yield such...