tantivy
tantivy copied to clipboard
Range queries on JSON fields
Is your feature request related to a problem? Please describe.
I cannot perform range queries on JSON fields. For example, the examples/json_field.rs has a search like so:
{
let query = query_parser.parse_query("cart.product_id:103")?;
let count_docs = searcher.search(&*query, &Count)?;
assert_eq!(count_docs, 1);
}
But I cannot rewrite the query like this:
{
let query = query_parser.parse_query("cart.product_id < 110")?;
let count_docs = searcher.search(&*query, &Count)?;
assert_eq!(count_docs, 1);
}
the count comes back 0.
Describe the solution you'd like
We should be able to search nested JSON documents with the usual <, >, etc.
Isn't the syntax for range queries supported by the built-in QueryParser different, meaning it would be something like
cart.product_id:[0 TO 110}
(assuming that 0 is the smallest possible ID).
The parser doesn't handle this currently, but this should work cart.product_id:<110 or cart.product_id:{* TO 110}
Related quickwit issue: https://github.com/quickwit-oss/quickwit/issues/2431
@PSeitz Attempting the :< syntax of :{{* TO 110}} (Rust is complaining about the unescaped {}, thus the double-brackets) returns Unsupported query: Range query are not supported on json field for me.
Indeed, it's disabled. I don't think there's a inherent reason, except some code missing to handle that. @fulmicoton?
If people want to contribute?
Hi @fulmicoton I would like to contribute, if you can give any entry point to start working on this, thanks.
@yollotltamayo
https://github.com/quickwit-oss/tantivy/tree/main/src/query/range_query
The code is sometimes bound to a fixed field in the schema. These would need to be replaced with something that can handle JSON, e.g. "myjson.fielda" https://github.com/quickwit-oss/tantivy/blob/main/src/query/range_query/range_query.rs#L335
A range query can run on the columnar storage and on the inverted index. I would implement it first for the inverted index, as it should be simpler.
let me know if your are still interested
I've implemented similar ExistsQuery for jsons, feel free to port it back: https://github.com/izihawa/summa/blob/master/summa-core/src/components/queries/exists_query.rs#L90
Here's my custom implementation of a range query for JSON fields for anyone interested: https://github.com/georocket/georocket/blob/bac0325889d43f93389a54327b90338527ef03c2/rust/core/src/index/tantivy/json_range_query.rs
It's basically the same code as that of Tantivy's range query. I just changed the type of field from String to Field.