François Massot
François Massot
@PSeitz @fulmicoton : here is my take on what to do for next tantivy release and what to do after: These remarks are valid for integers only (for floats, our...
You may have already read it, just in case, here is one of the lucene implementation: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SloppyPhraseMatcher.java
I get it, you have implemented what Elasticsearch calls "dynamic field mapping". You can have more control of data types by defining some [mapping templates](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html). We definitely have to consider...
@fulmicoton can you add a link for Toplev? @scampi : PISA project is here https://github.com/pisa-engine/pisa Concerning bottlenecks can be IO or CPU or both. Let's take the example of union...
@scampi for toplev, see the manual here: https://github.com/andikleen/pmu-tools/wiki/toplev-manual
> I like the lucene way of describing the file formats (it is not proper to lucene actually I have seen it elsewhere). > e.g. > > https://lucene.apache.org/core/3_0_3/fileformats.html#Segments%20File > >...
@fulmicoton ah yes, that was not clear, my idea was to be able to process directly articles contents not the ngram dataset which is there because of legal constraints.
Yes the first assert would be good, there is no infinite loop on `target = TERMINATED`. Let me make a nice PR :)
Good question @lavrd I had a look at the snippet code, and currently, we don't handle snippeting with text coming from the JSON field. This is visible here: https://github.com/quickwit-oss/tantivy/blob/main/src/snippet/mod.rs#L281-L285 You...
Hi @bionicles, I think the tantivy spirit is to keep it as a library and put stuff like what you suggest in another repository. @fulmicoton did a really good job...