tantivy
tantivy copied to clipboard
About different indices and schemas
My use-case is an indexing agent that indexes certain websites that a user specifies. The websites are grouped into similar technology stacks.
For example, all ***.stackoverflow.com websites are the same, all websites using Wikimedia behave the same etc.. And each of these families might have different features and properties I would like to index. Depending on the family it might be possible to extract more knowledge or more structured knowledge than from a simple website.
Other families could be audio files containing a lot of metadata I'd like to be able to query for: lyrics, year, artist. Maybe images too with their: size, timestamps, primary colors, aspect ratio etc..
This question is a follow-up of #2221. Performance wise, is it better to have one single, big index with sparse entries. Or would it be better to have a single index for each family mentioned above. And have multiple readers accessing the index files simultaneously?
It is hard to test this without building the index beforehand. But it takes a lot of time to prototype it. So I was hoping for people with more insights to give me a little bit help and advice.
I personally feel like the second approach might explode quickly if one has too many families.
Indexing or search performance? What type of query?
@mainrs did you figure out an optimal solution for your problem? I am dealing with something quite similar and would appreciate any insight you may have to offer!