sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] IS NOT NULL with alias throws SemanticCheckException

Open ankurgupta4891 opened this issue 4 years ago • 9 comments

Describe the bug Using IS NOT NULL with alias throws SemanticCheckException.

To Reproduce Sample query:

SELECT dest_city_name AS dc
FROM all_flights
WHERE dc IS NOT NULL

Expected behavior Should be same as query:

SELECT dest_city_name AS dc
FROM all_flights
WHERE dest_city_name IS NOT NULL

ankurgupta4891 avatar Nov 18 '21 20:11 ankurgupta4891

Thanks for reporting issue. Currently we doesn't support use alias in where clause. One workaround should be

SELECT dest_city_name AS dc
FROM all_flights
WHERE dest_city_name IS NOT NULL

penghuo avatar Jan 06 '22 19:01 penghuo

@ankurgupta4891 In https://github.com/opendistro-for-elasticsearch/sql/issues/1145, you mentioned, "I tried to give simple version of our use case but we need to get alias support with IS NOT NULL"

Could you share your use case? Are u using tools to auto generate query?

penghuo avatar Jan 06 '22 22:01 penghuo

confirmed offline. Query is auto generated.

Next Step

We need to evaluate the tech solution. The challenge is that SQL statements is analyzed in predefined order of, FROM -> WHERE -> SELECT, alias defined in SELECT which is not visible to WHERE clause.

penghuo avatar Mar 15 '22 18:03 penghuo

Hey team, I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable." (Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well? Thanks

jingyali-apr avatar Aug 31 '22 23:08 jingyali-apr

Hey team, I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable." (Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well? Thanks

@jingyali-apr Thanks for reporting the issue! Could you share your index mapping and some test data for investigation? And it would be helpful if you can find error stacktrace in ES log.

dai-chen avatar Sep 01 '22 16:09 dai-chen

Hey team, I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable." (Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well? Thanks

@jingyali-apr Thanks for reporting the issue! Could you share your index mapping and some test data for investigation? And it would be helpful if you can find error stacktrace in ES log.

Hey @dai-chen, please check below samples.

Successful case without IS NOT NULL

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY state limit 5"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "alias": "state",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alabama"
    ],
    [
      "Alaska"
    ],
    [
      "California"
    ],
    [
      "Colorado"
    ],
    [
      "Connecticut"
    ]
  ],
  "total": 5,
  "size": 5,
  "status": 200
}

Failed case with IS NOT NULL

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY state IS NOT NULL limit 5"
}

{
  "error": {
    "reason": "Invalid SQL query",
    "details": "can't resolve Symbol(namespace=FIELD_NAME, name=state) in type env",
    "type": "SemanticCheckException"
  },
  "status": 400
}

Successful case without using alias

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL limit 5"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "alias": "state",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alabama"
    ],
    [
      "Wisconsin"
    ],
    [
      "West Virginia"
    ],
    [
      "Virginia"
    ],
    [
      "Vermont"
    ]
  ],
  "total": 5,
  "size": 5,
  "status": 200
}

Partial test data:

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm FROM all_flights_integ_tests limit 50"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alaska"
    ],
    [
      "South Carolina"
    ],
    [
      "Michigan"
    ],
    [
      "Texas"
    ],
    [
      "Texas"
    ],
    [
      "New York"
    ],
    [
      "Michigan"
    ],
    [
      "Massachusetts"
    ],
    [
      "Texas"
    ],
    [
      "Hawaii"
    ],
    [
      "Hawaii"
    ],
    [
      "Alabama"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Ohio"
    ],
    [
      "Michigan"
    ],
    [
      "Texas"
    ],
    [
      "Ohio"
    ],
    [
      "New York"
    ],
    [
      "Virginia"
    ],
    [
      "Virginia"
    ],
    [
      "Michigan"
    ],
    [
      "Florida"
    ],
    [
      "Wisconsin"
    ],
    [
      "Texas"
    ],
    [
      "Mississippi"
    ],
    [
      "New York"
    ],
    [
      "Georgia"
    ],
    [
      "Texas"
    ],
    [
      "New Jersey"
    ],
    [
      "Texas"
    ],
    [
      "Louisiana"
    ],
    [
      "New York"
    ],
    [
      "Alabama"
    ],
    [
      "Texas"
    ],
    [
      "Kentucky"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Tennessee"
    ],
    [
      "New Jersey"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Maryland"
    ],
    [
      "Colorado"
    ],
    [
      "Tennessee"
    ]
  ],
  "total": 50,
  "size": 50,
  "status": 200
}

Thanks

jingyali-apr avatar Sep 01 '22 16:09 jingyali-apr

@jingyali-apr Could you try to use the original field name in ORDER BY clause as below?

POST _opendistro/_sql
{
"query" : """
  SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL
 """
}

dai-chen avatar Sep 01 '22 17:09 dai-chen

@jingyali-apr Could you try to use the original field name in ORDER BY clause as below?

POST _opendistro/_sql
{
"query" : """
  SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL
 """
}

Yeah, I tried this and it worked.

See my Successful case without using alias in above comment.

jingyali-apr avatar Sep 01 '22 17:09 jingyali-apr

@jingyali-apr Thanks for confirming! I may have found where things go wrong: https://github.com/opensearch-project/sql/blob/907019cc8f7ce05c34ed453a75e4d897f55b57c5/sql/src/main/java/org/opensearch/sql/sql/parser/AstSortBuilder.java#L53

When we try to replace alias in ORDER BY, we only replace ordinal or alias directly. In our case, state is NOT NULL is an expression which causes alias replacement fail. Will debug to confirm and open issue/PR to fix. However, just to let you know, the fix would be only available in latest OpenSearch version (probably 2.3 next). Thanks!

dai-chen avatar Sep 01 '22 18:09 dai-chen

https://stackoverflow.com/a/942592 You can only use column aliases in GROUP BY, ORDER BY, or HAVING clauses.

Standard SQL doesn't allow you to refer to a column alias in a WHERE clause. This restriction is imposed because when the WHERE code is executed, the column value may not yet be determined.

Copied from MySQL documentation

Yury-Fridlyand avatar Sep 12 '22 23:09 Yury-Fridlyand

Copied from MySQL documentation

Note that was the [Wayback/Archive] MySQL :: MySQL 5.1 Reference Manual :: B.5.5.4 Problems with Column Aliases documentation until December 2009. After that it got rephrased (including "doesn't allow" to "disallow", "alias" to "aliases", "code" to "clause) and still is in the current form at [Wayback/Archive] MySQL :: MySQL 8.0 Reference Manual :: B.3.4.4 Problems with Column Aliases:

Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.

jpluimers avatar Dec 01 '22 11:12 jpluimers