tantivy
tantivy copied to clipboard
expand_dots and object reconstruction
expand_dots Function
When expand_dots_enabled is true, keys with a . are treated as JSON path separators, not as literal periods. For instance, expand_dots_enabled converts {"root": {"child.with.dot": "hello"}} into a nested object: {"root": {"child": {"with": {"dot": "hello"}}}}, making the query root.child.with.dot:hello valid. If false, keys retain their . and must be escaped in queries: root.child\.with\.dot:hello.
Issue
expand_dots_enabled is achieved currently by changing the path stored in inverted index and fast fields.
This will cause issues with object reconstruction from the path (used in https://github.com/quickwit-oss/tantivy/pull/2198)
Instead this could be a query time option, that tries to resolve root.child.with.dot:hello to root.child\.with\.dot:hello, when querying (maybe with metadata created during indexing).
Is #2198 really the issue you wanted to link? I don't understand how it is related.
I thought that the docvalue_fields option in top_hits reconstructs the object from those fields, but I just found out that's actually not the case and it just returns a flat list instead.
{
"aggs": {
"top_tags": {
"terms": {
"field": "user.id.keyword",
"size": 3
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"docvalue_fields": [
"user.id.keyword",
"user.real.name.keyword"
],
"size": 1
}
}
}
}
}
}
Is there still a use-case for this if not in #2198?
Yes, I think we should change it. Currently we change the way paths are stored internally, depending on the expand_dots option.
Internally we should have a canonical way to store paths independently on how they are addressed.