tantivy
tantivy copied to clipboard
Support for term alternatives in phrase.
Hello! First of all, I want to thank you for your awesome work on tantivy, it's great!
Motivation
I'm trying to use quickwit/tantivy for non-structured log search and stuck with migration from Sphinx/Manticore.
Using Sphinx query syntax I can write something like:
level NEAR/0 (info | warn | error)
But with tantivy I need to expand this expression by hand:
"level info" OR "level warn" OR "level error"
If I have more than two elements (in alternative or in phrase), then writing and reading the query becomes more difficult:(
Solution
It would be great to have some syntactic sugar or rust low-level API (I can parse query by myself) to avoid such boilerplate. Also I would enjoy to implement this feature by myself, but I will need some help around query evaluation.
We support the sql like in parameter
level: IN [info warn error]
You may also have a look at quickwit, which is built on tantivy for log search.
We support the sql like
inparameterlevel: IN [info warn error]You may also have a look at quickwit, which is built on tantivy for log search.
Hello, @PSeitz! Yes, but I work with non-structured logs (text logs with different formats from ~hundred sources): level is part of a body:(
That use case and syntax seems rather niche, so I'm not sure we would want to add that in the query parser. maybe @fulmicoton has an opinion on this
Not a targetted solution, but wouldn't a RegexQuery be able to handle this? As an aside, I would actually be really interested in us exposing regex queries via the parser to make it easier for people to experiment with them, but I suspect the quoting/escaping will be somewhat messy.
That use case and syntax seems rather niche
This use-case is one of the main reasons why Elastic Search Span API exists :)
Maybe I oversimplified the example, but analytical queries (e.g. incident detection) can consist of dozens of terms/alternatives.
we would want to add that in the query parser
Personally, I'm not sure about this as well (because of complexity for generic user). But it would be great to have ability to encode this information through some IR (like Elastic did with Span API).
Also I need to mention, that parsing is not the main problem with this issue: if we expand non-trivial phrase into alternative of trivial, we'll have $ \prod_{i=1}^n|p_i|, p \in [p_0, ..., p_n] (phrase)$ sub-queries. That's why this issue requires to modify query execution as well:(
be able to handle this
Hello, @adamreichold! Yes, it works.
So if I understand correctly, you would want a PhraseQuery that supports multiple terms. That's not supported currently and would probably make sense to add.
@npatsakula is it for quickwit or tantivy?
@fulmicoton, the ideal option would be to support the ES Span API in the quickwit, but it quite tricky and requires a lot of work (e.g. slop evaluation for non-trivial sub-queries). I assumed (maybe, by mistake) that change of that size and complexity would be unwanted in the quickwit mainstream, so I published smaller issue for tantivy.