search-benchmark-game Add filtering commands.

These new commands allow running queries against a filter that matches 1% or 10% of documents.

Filters are interesting because some optimizations that are easy/obvious for exhaustive evaluation become more complicated when a filter is applied. Yet filters are common, think of an e-commerce search filtered by category for instance.

Only the Lucene 10.0 engine supports filtering for now because I'm not too familiar with Rust, but I assume that it should be easy to add support for it to the Tantivy engine.

Nov 14 '24 21:11 jpountz

Thanks for the PR, filtering is a great addition.

I think the query side should be handled via the query list, with an added tag. The commands are more like different collectors.

Nov 15 '24 09:11 PSeitz

This is how I started, but I would really like to see how all queries perform when a filter is applied, not just a few of them, and duplicating all queries didn't feel right (especially if duplicated twice, once for the 1% filter and another time for the 10% filter). This would also make it annoying to add more queries to the benchmark.

So my second idea was to add it as another dimension to the benchmark (one dimension is the command (=collector), another one the query, another one the filter density) but it felt a bit over-engineered. So I came to this 3rd approach of coupling it with the command, which didn't feel great at first, but now feels to me like the least worst approach?

Nov 15 '24 22:11 jpountz

I think duplicating should be fine, but we could have it in code when loading the queries. This has the advantage that you can easily get an overview and compare the different results with a single run.

We may add a FILTER_TAG option or similar to filter queries with certain tags.

Another things that's missing currently is searching on multiple fields, which is probably the much more common use case.

Nov 16 '24 01:11 PSeitz

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

Nov 18 '24 08:11 fulmicoton

FWIW I have started looking at applying @PSeitz 's approach: https://github.com/quickwit-oss/search-benchmark-game/compare/master...jpountz:search-benchmark-game:filtered_queries?expand=1 in case you want to take a look (the query parsing bits are still missing).

Nov 18 '24 08:11 jpountz

@PSeitz We'd also need to make sure the query language handles it though (filters should not impact scoring). It might be a pain.

I'd go with @jpountz solution for simplicity.

Good point. I think the complexity should be the same for both, where we need to have a special query handling to pass in the filter, but the usage with the queries approach should be easier and would require just one run instead of 3 or 4 runs to get the full picture.

Nov 19 '24 04:11 PSeitz

search-benchmark-game search-benchmark-game copied to clipboard

Add filtering commands.

search-benchmark-game
search-benchmark-game copied to clipboard