lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Instrument IndexOrDocValuesQuery to report on its decisions

Open stefanvodita opened this issue 1 year ago • 3 comments

Description

For Amazon Product Search, we use IndexOrDocValuesQuery and have changed it to take a listener type object that records and reports on the decision the query has made. We look at how often it picks indexed vs doc values and we can see if there are changes in the frequency with which we take each branch. Would this be a welcome addition?

stefanvodita avatar May 30 '24 21:05 stefanvodita

Thinking out loud: I'd like queries to remain as close to value classes as possible, just describing an information need. I believe that the change you're suggesting would require storing a listener on IndexOrDocValuesQuery, which would go against this. Maybe we should introduce a more general framework to allow queries and collectors to report about some interesting decisions they make, and keep the state on e.g. IndexSearcher rather than Query?

jpountz avatar May 31 '24 12:05 jpountz

+1 to keeping Query classes lean.

A general framework on IndexSearcher sounds nice, but it's hard to generalize with just this one use case? Can we think of other queries/collectors that might also benefit from this? Maybe the exotic rewrite choices that MultiTermQuery subclasses make (rewrite as filter, rewrite to boolean disjucntion of TermQuery, ...)?

mikemccand avatar Jun 01 '24 10:06 mikemccand

A general framework on IndexSearcher sounds nice, but it's hard to generalize with just this one use case?

Can it be something like IndexWriter's InfoStream, but for search? Or do we need/want something more structured?

Can we think of other queries/collectors that might also benefit from this?

Some examples that come to mind:

  • Some users may want to know what fields are used for searching or filtering in general, e.g. what percentage of queries are filtered by category? by brand? by price?
  • What clause(s) leads query evaluation.
  • Number of hits that get evaluated vs. cost. (ie. how much is dynamic pruning helping). Or maybe this one is too heavy and belongs to the query profiler.
  • What scorer is used, e.g. BS1 vs. BS2.

jpountz avatar Jun 04 '24 14:06 jpountz