Adrien Grand

Results 41 issues of Adrien Grand

Currently different engines use different parameters for BM25, e.g. Tantivy and Lucene use (k1=1.2,b=0.75) while PISA uses (k1=0.9,b=0.4). Robertson et al. had initially suggested that 1.2/0.75 would make good defaults...

It seems to me that the pisa-0.8.2 engine forces evaluation of all hits with the TOP_10_COUNT task, but it doesn't collect them into a priority queue as I would expect....

The JNI code writes the uncompressed length at the beginning of the stream using an uint32_t, thus depending on the byte order of the running computer.

Most of our query benchmarks operate on read-only data. This enables Elasticsearch to use a few optimizations such as - caching requests at the shard level the first time it...

enhancement

Elasticsearch is about to stop reporting memory usage of segments via https://github.com/elastic/elasticsearch/pull/75274, including per-segment data structures like terms, points or doc values memory usage. Rally currently uses these stats in...

help wanted
:Telemetry
:Reporting
cleanup
good first issue

### Description I have been looking at many ingestion flame charts recently, in the context of TSDB and merging changes. They highlighted a few things we could do to speed...

>enhancement
Meta
:Distributed/Engine
Team:Distributed

### Description It would be nice to enable recursive graph bisection out of the box, so that users don't even have to know that it exists or what it is...

This adds the same `targetSearchConcurrency` parameter to `LogMergePolicy` that #13430 is adding to `TieredMergePolicy`. The implementation is simpler here since `LogMergePolicy` is constrained to only merging adjacent segments. From simulating...

This introduces `TermsEnum#prepareSeekExact`, which essentially calls `IndexInput#prefetch` at the right offset for the given term. Then it takes advantage of the fact that `BooleanQuery` already calls `Weight#scorerSupplier` on all clauses,...

When searching across multiple segments, one doesn't need to wait until the first segment is done collecting to start doing the I/O for terms dictionary lookups in the next segment....