tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

expand_dots and object reconstruction

Open PSeitz opened this issue 2 years ago • 4 comments

expand_dots Function

When expand_dots_enabled is true, keys with a . are treated as JSON path separators, not as literal periods. For instance, expand_dots_enabled converts {"root": {"child.with.dot": "hello"}} into a nested object: {"root": {"child": {"with": {"dot": "hello"}}}}, making the query root.child.with.dot:hello valid. If false, keys retain their . and must be escaped in queries: root.child\.with\.dot:hello.

Issue

expand_dots_enabled is achieved currently by changing the path stored in inverted index and fast fields. This will cause issues with object reconstruction from the path (used in https://github.com/quickwit-oss/tantivy/pull/2198)

Instead this could be a query time option, that tries to resolve root.child.with.dot:hello to root.child\.with\.dot:hello, when querying (maybe with metadata created during indexing).

PSeitz avatar Nov 07 '23 02:11 PSeitz

Is #2198 really the issue you wanted to link? I don't understand how it is related.

fulmicoton avatar Nov 15 '23 00:11 fulmicoton

I thought that the docvalue_fields option in top_hits reconstructs the object from those fields, but I just found out that's actually not the case and it just returns a flat list instead.

{
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "user.id.keyword",
        "size": 3
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "docvalue_fields": [
                  "user.id.keyword",
                  "user.real.name.keyword"
            ],
            "size": 1
          }
        }
      }
    }
  }
}

PSeitz avatar Nov 15 '23 01:11 PSeitz

Is there still a use-case for this if not in #2198?

ditsuke avatar Nov 16 '23 07:11 ditsuke

Yes, I think we should change it. Currently we change the way paths are stored internally, depending on the expand_dots option.

Internally we should have a canonical way to store paths independently on how they are addressed.

PSeitz avatar Nov 16 '23 08:11 PSeitz