Multifields and arrays
How can tantivy currently support: a) multifields (aka multiple analyzers for one field, ES equivalent https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) b) arrays (multiple values in one fieds, ES equivalent https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html).
Are there some plans for it? Or should this be handled in layer above?
My original usecase is that document might have multiple authors (thus array of strings) and also I want to analyze each author name by cutting into words (to do easy search by given name and family name), and also by keeping it all together (to do exact search/faceting on whole names).
a) At one point we need to introduce the concept of mapping. Right now, the "input" schema is the same as the index schema. You cannot index a given field several times. This is a wanted feature.
b) arrays... I would have to read the elasticsearch documentation a little more to be certain that it is not hiding something complicated but in tantivy arrays of int and arrays of strings are called multivalued fields in tantivy. They work out of the box for int and string.
The object does not work at all, but introducing a "schemaless" JSON-like field is a wanted feature. It will come with the same pitfalls as the disclaimer in the doc you shared : search will not work as expected.
Now Lucene, and ES have another feature called nested documents I think. This is more complicated and I don't think it will happen any time soon.
Hi @fulmicoton,
They work out of the box for int and string
Could you point me to where/how I can use arrays/multivalued fields?
Looking at the Value kinds, I don't see it.
I would like very much to attach a Vec of u64s or Strings to a document, and tell the Searcher to return documents only where the field of Vec of data contains 'x'.
You can just add the field multiple times in the Document.
index_writer.add_document(doc!(
date_field => DateTime::from_timestamp_secs(1000),
date_field => DateTime::from_timestamp_secs(1001),
))?;