elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

ES|QL: support name qualifiers

Open astefan opened this issue 1 year ago • 5 comments

Description

  • [ ] revisit https://github.com/elastic/elasticsearch/pull/110581 which removed the original implementation for qualifiers
  • [ ] implement qualifiers for LOOKUP https://github.com/elastic/elasticsearch/issues/112014
  • [ ] investigate and expand the name qualifiers to ENRICH https://github.com/elastic/elasticsearch/issues/112015

LOOKUP command is a powerful tool that helps users combine static tables of values with "live" data from Elasticsearch. It is consistent with the rest of the language features when it comes to same-name columns in that "the last column" wins.

For example (notice salary column that is present in employees index and salary column that is present in the static table benefits)

{
  "query": "FROM employees | WHERE languages IS NOT NULL | SORT salary DESC | KEEP first_name, last_name, languages, salary | LOOKUP benefits ON languages | LIMIT 5",
  "tables": {
    "benefits": {
      "languages":    {"integer": [1, 2, 3, 4, 5]},
      "salary":       {"integer": [100,200,300,400,500]},
      "bonus_percent":{"double":  [5,7,9,11,15]}
    }
  }
}

will result in

  first_name   |   last_name   |   languages   |    salary     | bonus_percent 
---------------+---------------+---------------+---------------+---------------
Moss           |Shanbhogue     |3              |300            |9.0            
Tzvetan        |Zielinski      |4              |400            |11.0           
Lillian        |Haddadi        |1              |100            |5.0            
Valter         |Sullins        |2              |200            |7.0            
Remzi          |Waschkowski    |3              |300            |9.0            

salary column from employees is being replaced by the salary column from the benefits table. One can argue, though, that both columns are useful and must be kept. There is an workaround here and requires few changes to the query ("manually" copying the old values in a new column):

{
  "query": "FROM employees | WHERE languages IS NOT NULL | EVAL old_salary = salary | SORT old_salary DESC | KEEP first_name, last_name, languages, salary, old_salary | LOOKUP benefits ON languages | LIMIT 5",
  "tables": {
    "benefits": {
      "languages":    {"integer": [1, 2, 3, 4, 5]},
      "salary":       {"integer": [100,200,300,400,500]},
      "bonus_percent":{"double":  [5,7,9,11,15]}
    }
  }
}

which results in

  first_name   |   last_name   |   languages   |  old_salary   |    salary     | bonus_percent 
---------------+---------------+---------------+---------------+---------------+---------------
Moss           |Shanbhogue     |3              |74970          |300            |9.0            
Tzvetan        |Zielinski      |4              |74572          |400            |11.0           
Lillian        |Haddadi        |1              |73717          |100            |5.0            
Valter         |Sullins        |2              |73578          |200            |7.0            
Remzi          |Waschkowski    |3              |71165          |300            |9.0            

But we can do better and have these additional steps be added automatically in the form of name qualifiers:

{
  "query": "FROM employees | WHERE languages IS NOT NULL | SORT salary DESC | KEEP first_name, last_name, languages, salary | LOOKUP benefits ON languages | LIMIT 5",
  "tables": {
    "benefits": {
      "languages":    {"integer": [1, 2, 3, 4, 5]},
      "salary":       {"integer": [100,200,300,400,500]},
      "bonus_percent":{"double":  [5,7,9,11,15]}
    }
  }
}

resulting in

  first_name   |   last_name   |   languages   |    salary     |employees.salary|benefits.salary| bonus_percent 
---------------+---------------+---------------+---------------+----------------+---------------+---------------
Moss           |Shanbhogue     |3              |74970          |74970           |300            |9.0            
Tzvetan        |Zielinski      |4              |74572          |74572           |400            |11.0           
Lillian        |Haddadi        |1              |73717          |73717           |100            |5.0            
Valter         |Sullins        |2              |73578          |73578           |200            |7.0            
Remzi          |Waschkowski    |3              |71165          |71165           |300            |9.0            

astefan avatar Aug 20 '24 10:08 astefan

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine avatar Aug 20 '24 10:08 elasticsearchmachine

Users will eventually need a way to provide their own name (alias) for the qualifier, as the index names can be quite long (.ds-metrics-prometheus.collector-mki-ksm-crd-scraper-2024.08.21-000645 for example). You wouldn't want to have to type that when referring to a column. It's a another nice layer on top of what's proposed above but I think we should think about it now to avoid making it harder later.

tylerperk avatar Aug 22 '24 13:08 tylerperk

What happens if the main index resolves to more than one index? For example, if logs-* resolves to both logs-1 and logs-2, what would the columns be in this case?

dnhatn avatar Aug 28 '24 07:08 dnhatn

We have at least two more options here:

  • explicitly choose (and in case rename) the lookup columns:
LOOKUP benefits ON languages WITH benefits.salary = salary, bonus_percent` 

It's the same syntax we have in ENRICH, so in terms of consistency it seems desirable

  • declare a general prefix for all the fields extracted from the lookup:
LOOKUP benefits ON languages PREFIX `benefits.`

I think it's important to point out that any implicit behavior (eg. an implicit prefix as the name of the lookup table) will be a breaking change and won't avoid collisions (eg. employees could have a field called benefits.salary)

luigidellaquila avatar Aug 28 '24 07:08 luigidellaquila

LOOKUP benefits ON languages WITH benefits.salary = salary, bonus_percent` 

It's the same syntax we have in ENRICH, so in terms of consistency it seems desirable

That'd be consistent, but the problem with the WITH syntax is that it requires explicit aliasing of each and every field. That applies even more so to ENRICH: if the enrich index has many fields, you may accidentally overwrite your fields, or you'll have to list each and every field you want. Qualifiers are one way to solve this, by avoiding name conflicts altogether.

alex-spies avatar Aug 28 '24 08:08 alex-spies

I'm picking this back up, but the world has changed a little since we initially wanted this.

Instead of heavily revising this issue, I'm going to go and create a new one so we don't lose history.

alex-spies avatar Aug 18 '25 12:08 alex-spies

Superseded by https://github.com/elastic/elasticsearch/issues/133068

alex-spies avatar Aug 18 '25 12:08 alex-spies