François Massot comments

Results 105 comments of


                                            François Massot

Add piecewise linear codec, deprecate linear and mulitinear, complete benchmark with real world datasets.

@PSeitz @fulmicoton : here is my take on what to do for next tantivy release and what to do after: These remarks are valid for integers only (for floats, our...

Phrase query: scale score on used slop

You may have already read it, just in case, here is one of the lucene implementation: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SloppyPhraseMatcher.java

Add new field type for semi-structured data indexing and efficient querying

I get it, you have implemented what Elasticsearch calls "dynamic field mapping". You can have more control of data types by defining some [mapping templates](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html). We definitely have to consider...

Investigate the bottleneck in the search benchmark using toplev

@fulmicoton can you add a link for Toplev? @scampi : PISA project is here https://github.com/pisa-engine/pisa Concerning bottlenecks can be IO or CPU or both. Let's take the example of union...

Investigate the bottleneck in the search benchmark using toplev

@scampi for toplev, see the manual here: https://github.com/andikleen/pmu-tools/wiki/toplev-manual

Document index file formats - Issue/981

> I like the lucene way of describing the file formats (it is not proper to lucene actually I have seen it elsewhere). > e.g. > > https://lucene.apache.org/core/3_0_3/fileformats.html#Segments%20File > >...

Add shingle token filter or token n-grams

@fulmicoton ah yes, that was not clear, my idea was to be able to process directly articles contents not the ngram dataset which is there because of legal constraints.

Protect against infinite loop in skip reader seek

Yes the first assert would be good, there is no infinite loop on `target = TERMINATED`. Let me make a nice PR :)

Can Tantivy generate snippets from JSON field?

Good question @lavrd I had a look at the snippet code, and currently, we don't handle snippeting with text coming from the JSON field. This is visible here: https://github.com/quickwit-oss/tantivy/blob/main/src/snippet/mod.rs#L281-L285 You...

please add a demo to the readme

Hi @bionicles, I think the tantivy spirit is to keep it as a library and put stuff like what you suggest in another repository. @fulmicoton did a really good job...