tantivy
tantivy copied to clipboard
Reverse queries?
Based on this description in ES for percolator queries: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html#how-it-works
Can you envision a way this library could be used for such "reverse querying" without modification? I'm really interested in this kind of query and this library.
It could be interesting to port Lucene's monitor module. https://lucene.apache.org/core/8_11_0/monitor/org/apache/lucene/monitor/package-summary.html I've never tried. Just would like to give a pointer here.
The feature is usually called "reverse search" as noted by @justinmchase . Luwak is another interesting project that communicates a lot about how to do it.
My understanding is that you end up creating one index where documents are queries. Handling only disjunction queries is super easy. Handling actual boolean queries is much trickier. One can do that by extracting a disjunction query implied by each query and do post filtering using that.
e.g. ((A and B) OR (C AND D) AND not E) => (A OR B OR C OR D)
The post filtering can relatively fast. But I think smarter try to be smart what should be the actual tokens to index.
It would be awesome to have something like that, but this is a big feature and I would like to have an actual user to help us drive the decision before launching an effort on that.
@justinmchase Do you have a use case you can discuss with us?
Luwak is another interesting project that communicates a lot about how to do it.
Yeah, Luwak was contributed back to Lucene, and it's now the monitor module I mentioned :)
Add Luwak as a lucene module https://issues.apache.org/jira/browse/LUCENE-8766
@mocobeta Oh sweet! I did not know! Thank you for the info!
I'm trying to build an app which does local indexing of files, nothing super fancy but essentially I want to be able to extract key words and phrases and build up an index of those phrases. From there I can just query the documents like normal but I'd like to do the reverse as well, detect the tags in the documents, without explicitly defining the reference between them. So therefore I could add a new document and just by updating the index, as I dynamically query for tags the document is available.
Also, of course, one major usecase for "reverse queries" is eventing, where you simply do the query and iterate over the results to trigger events for those matched queries.
I actually don't want to do anything super fancy but just simple phrase based queries and then logical AND / OR to get unions and intersections of phrases.
Upd.