tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Support for term alternatives in phrase.

Open npatsakula opened this issue 1 year ago • 8 comments

Hello! First of all, I want to thank you for your awesome work on tantivy, it's great!

Motivation

I'm trying to use quickwit/tantivy for non-structured log search and stuck with migration from Sphinx/Manticore.

Using Sphinx query syntax I can write something like:

level NEAR/0 (info | warn | error)

But with tantivy I need to expand this expression by hand:

"level info" OR "level warn" OR "level error"

If I have more than two elements (in alternative or in phrase), then writing and reading the query becomes more difficult:(

Solution

It would be great to have some syntactic sugar or rust low-level API (I can parse query by myself) to avoid such boilerplate. Also I would enjoy to implement this feature by myself, but I will need some help around query evaluation.

npatsakula avatar Dec 16 '23 21:12 npatsakula

We support the sql like in parameter level: IN [info warn error]

You may also have a look at quickwit, which is built on tantivy for log search.

PSeitz avatar Dec 17 '23 03:12 PSeitz

We support the sql like in parameter level: IN [info warn error]

You may also have a look at quickwit, which is built on tantivy for log search.

Hello, @PSeitz! Yes, but I work with non-structured logs (text logs with different formats from ~hundred sources): level is part of a body:(

npatsakula avatar Dec 17 '23 06:12 npatsakula

That use case and syntax seems rather niche, so I'm not sure we would want to add that in the query parser. maybe @fulmicoton has an opinion on this

PSeitz avatar Dec 17 '23 10:12 PSeitz

Not a targetted solution, but wouldn't a RegexQuery be able to handle this? As an aside, I would actually be really interested in us exposing regex queries via the parser to make it easier for people to experiment with them, but I suspect the quoting/escaping will be somewhat messy.

adamreichold avatar Dec 17 '23 10:12 adamreichold

That use case and syntax seems rather niche

This use-case is one of the main reasons why Elastic Search Span API exists :)

Maybe I oversimplified the example, but analytical queries (e.g. incident detection) can consist of dozens of terms/alternatives.

we would want to add that in the query parser

Personally, I'm not sure about this as well (because of complexity for generic user). But it would be great to have ability to encode this information through some IR (like Elastic did with Span API).

Also I need to mention, that parsing is not the main problem with this issue: if we expand non-trivial phrase into alternative of trivial, we'll have $ \prod_{i=1}^n|p_i|, p \in [p_0, ..., p_n] (phrase)$ sub-queries. That's why this issue requires to modify query execution as well:(

be able to handle this

Hello, @adamreichold! Yes, it works.

npatsakula avatar Dec 17 '23 15:12 npatsakula

So if I understand correctly, you would want a PhraseQuery that supports multiple terms. That's not supported currently and would probably make sense to add.

PSeitz avatar Dec 17 '23 16:12 PSeitz

@npatsakula is it for quickwit or tantivy?

fulmicoton avatar Jan 08 '24 04:01 fulmicoton

@fulmicoton, the ideal option would be to support the ES Span API in the quickwit, but it quite tricky and requires a lot of work (e.g. slop evaluation for non-trivial sub-queries). I assumed (maybe, by mistake) that change of that size and complexity would be unwanted in the quickwit mainstream, so I published smaller issue for tantivy.

npatsakula avatar Jan 08 '24 11:01 npatsakula