Adrien Grand issues

Results 41 issues of


Adrien Grand

Recommend scoring hits with BM25(k1=0.9,b=0.4).

Currently different engines use different parameters for BM25, e.g. Tantivy and Lucene use (k1=1.2,b=0.75) while PISA uses (k1=0.9,b=0.4). Robertson et al. had initially suggested that 1.2/0.75 would make good defaults...

PISA should compute top hits for task TOP_10_COUNT

It seems to me that the pisa-0.8.2 engine forces evaluation of all hits with the TOP_10_COUNT task, but it doesn't collect them into a priority queue as I would expect....

Implicit byte order when writing the uncompressed size

The JNI code writes the uncompressed length at the beginning of the stream using an uint32_t, thus depending on the byte order of the running computer.

Should Rally disable global ordinals caching by default?

Most of our query benchmarks operate on read-only data. This enables Elasticsearch to use a few optimizations such as - caching requests at the shard level the first time it...

enhancement

Remove references to segments memory usage

Elasticsearch is about to stop reporting memory usage of segments via https://github.com/elastic/elasticsearch/pull/75274, including per-segment data structures like terms, points or doc values memory usage. Rally currently uses these stats in...

help wanted

:Telemetry

:Reporting

cleanup

good first issue

Indexing speed ups

### Description I have been looking at many ingestion flame charts recently, in the context of TSDB and merging changes. They highlighted a few things we could do to speed...

>enhancement

Enable recursive graph bisection out of the box?

### Description It would be nice to enable recursive graph bisection out of the box, so that users don't even have to know that it exists or what it is...

Add a `targetSearchConcurrency` parameter to `LogMergePolicy`.

This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy` that #13430 is adding to `TieredMergePolicy`. The implementation is simpler here since `LogMergePolicy` is constrained to only merging adjacent segments. From simulating...

Use `IndexInput#prefetch` for terms dictionary lookups.

This introduces `TermsEnum#prepareSeekExact`, which essentially calls `IndexInput#prefetch` at the right offset for the given term. Then it takes advantage of the fact that `BooleanQuery` already calls `Weight#scorerSupplier` on all clauses,...

Inter-segment I/O concurrency.

When searching across multiple segments, one doesn't need to wait until the first segment is done collecting to start doing the I/O for terms dictionary lookups in the next segment....