OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] The aggs result of NestedAggregator with sub NestedAggregator may be not accurately

Open kkewwei opened this issue 1 year ago • 2 comments

Describe the bug

the result of NestedAggregator with sub NestedAggregator is not accurately here, the two values of doc_count should be 4. image

Related component

Search:Aggregations

To Reproduce

  1. create the index.
PUT index1_nest111
{
    "settings": {
    "index.refresh_interval":"30s"
    }, 
   "mappings": {
      "properties": {
         "nested1": {
            "type": "nested",
            "properties": {
               "name": {
                  "type": "keyword"
               }
            }
         },
         "nested2": {
            "type": "nested",
            "properties": {
               "age": {
                  "type": "long"
               }
            }
         }
      }
   }
}
  1. put the data. the 4 documents are same, except for the _id:
POST _bulk?refresh=true
{ "index": { "_index": "index1_nest111", "_id": "1" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
{ "index": { "_index": "index1_nest111", "_id": "2" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }


POST _bulk?refresh=true
{ "index": { "_index": "index1_nest111", "_id": "3" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
{ "index": { "_index": "index1_nest111", "_id": "4" } }
{ "nested2": {"age":1}, "nested1": {"name": "name1"} }
  1. aggregation
POST index1_nest111/_search
{
  "aggregations": {
    "out_nested": {
      "aggregations": {
        "out_terms": {
          "aggregations": {
            "inner_nested": {
              "aggregations": {
                "inner_terms": {
                  "terms": {
                    "field": "nested1.name"
                  }
                }
              },
              "nested": {
                "path": "nested1"
              }
            }
          },
          "terms": {
            "field": "nested2.age"
          }
        }
      },
      "nested": {
        "path": "nested2"
      }
    }
  },
  "size": 0
}

Expected behavior

The inner_nested.doc_count shouble alse be 4.

If it's a bug, I'm please to fix.

Additional Details

Host/Environment (please complete the following information):

  • OS: os2.9

kkewwei avatar Apr 19 '24 09:04 kkewwei

Nest2 child is outer nested aggregation, nest1 child is inner nested aggregation.

To help explain the describe above: image

When execute the inner nested aggregation, the parentDoc=0(the first lucene document id) will be discarded https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/nested/NestedAggregator.java#L196

We can see that parentDoc will not be always bigger than childDoc, which means that the function logic processBufferedChildBuckets is wrong, it will aggregate unrelated document.

kkewwei avatar Apr 22 '24 13:04 kkewwei

[Triage - attendees 1 2 3 4 5 6 7 8] @kkewwei Thanks for creating this issue, thanks for the pull request to address!

peternied avatar May 01 '24 15:05 peternied