K2Bridge icon indicating copy to clipboard operation
K2Bridge copied to clipboard

Add support for Significant Terms aggregation

Open tomconte opened this issue 3 years ago • 0 comments

Reference documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

"An aggregation that returns interesting or unusual occurrences of terms in a set." In Kibana, it can only be applied to text fields.

According to the documentation, this aggregation is primarily used with a "parent-level aggregation to segment the data ready for analysis". In other terms, it is typically a sub-aggregation of another bucket aggregation, like Terms or a Histogram.

This makes this issue dependant on support for sub-bucket aggregations (#145).

Before implementing, we need to identify how to perform a similar query in Kusto. According to the doc, Significant Terms "are the terms that have undergone a significant change in popularity measured between a foreground and background set. [...] In the simplest case, the foreground set of interest is the search results matched by a query and the background set used for statistical comparisons is the index or indices from which the results were gathered."

Sample request:

  "aggs": {
    "2": {
      "histogram": {
        "field": "AvgTicketPrice",
        "interval": 100,
        "min_doc_count": 1
      },
      "aggs": {
        "3": {
          "significant_terms": {
            "field": "DestCountry",
            "size": 3
          }
        }
      }
    }

Response: (extract)

  "aggregations": {
    "2": {
      "buckets": [
        {
          "3": {
            "doc_count": 749,
            "bg_count": 13059,
            "buckets": [
              {
                "key": "IT",
                "doc_count": 243,
                "score": 0.2552994319244096,
                "bg_count": 2371
              },
              {
                "key": "US",
                "doc_count": 172,
                "score": 0.11694192970564077,
                "bg_count": 1987
              },
              {
                "key": "CH",
                "doc_count": 79,
                "score": 0.10476945913799716,
                "bg_count": 691
              }
            ]
          },
          "key": 100,
          "doc_count": 749
        },
        {
          "3": {
            "doc_count": 1067,
            "bg_count": 13059,
            "buckets": [
              {
                "key": "IT",
                "doc_count": 241,
                "score": 0.0551183926043845,
                "bg_count": 2371
              },
              {
                "key": "CH",
                "doc_count": 83,
                "score": 0.03656787843506986,
                "bg_count": 691
              },
              {
                "key": "US",
                "doc_count": 185,
                "score": 0.02418926301801488,
                "bg_count": 1987
              }
            ]
          },
          "key": 200,
          "doc_count": 1067
        },

tomconte avatar Jan 20 '22 10:01 tomconte