sql
sql copied to clipboard
Comparing date, timestamp and string lead to failures and Date formatted as timestamp
Version 1.13 (might be a regression from previous versions)
The following query:
POST my_test_index_date/_doc/
{
"field1": {
"subFieldA": "2020-02-02"
}
}
generates the following mapping specifying the field as DATE:
{
"mappings": {
"_doc": {
"properties": {
"field1": {
"properties": {
"subFieldA": {
"type": "date"
}
}
}
}
}
}
}
When I try to query it in different combinations that I feel should probably be acceptable, it always fail:
Case 1
POST _opendistro/_sql
{
"query": """SELECT * FROM
my_test_index_date as i
WHERE
i.field1.subFieldA > '2020-02-01'
LIMIT 50;"""
}
Response -->
{
"error": {
"reason": "There was internal problem at backend",
"details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,STRING]",
"type": "ExpressionEvaluationException"
},
"status": 503
}
Why the field is considered TIMESTAMP here instead of DATE (like in the mapping)?
It'd be very nice to have the two types (TIMESTAMP and STRING) comparable since we don't want the user to always know what kind of Date it is in the mapping
Case 2
POST _opendistro/_sql
{
"query": """SELECT * FROM
my_test_index_date as i
WHERE
i.field1.subFieldA > DATE('2020-02-01')
LIMIT 50;"""
}
Response -->
{
"error": {
"reason": "There was internal problem at backend",
"details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,DATE]",
"type": "ExpressionEvaluationException"
},
"status": 503
}
Shouldn't TIMESTAMP and DATE be comparable?
Case 3
POST _opendistro/_sql
{
"query": """SELECT * FROM
my_test_index_date as i
WHERE
(i.field1.subFieldA) > TIMESTAMP('2020-02-01')
LIMIT 50;"""
}
Response --> timestamp parse error
{
"error": {
"type": "SearchPhaseExecutionException",
"reason": "Error occurred in Elasticsearch engine: all shards failed",
"details": "Shard[0]: NotSerializableExceptionWrapper[semantic_check_exception: timestamp:2020-02-01 in unsupported format, please use yyyy-MM-dd HH:mm:ss[.SSSSSS]]\n\nFor more details, please send request for Json format to see the raw response from elasticsearch engine."
},
"status": 503
}
Case 4, working, but unexpected output
POST _opendistro/_sql
{
"query": """SELECT * FROM
my_test_index_date as i
WHERE
DATE(i.field1.subFieldA) > DATE('2020-02-01')
LIMIT 50;"""
}
Response --> DATE in Timestamp format instead of DATE format like in mapping
{
"schema": [
{
"name": "field1",
"type": "object"
}
],
"datarows": [
[
{
"subFieldA": "2020-02-02 00:00:00"
}
]
],
"total": 1,
"size": 1,
"status": 200
}
similar issue in opendistrocommunity. https://discuss.opendistrocommunity.dev/t/sql-plugin-date-filter-causes-error/4883
Also this I believe it's a breaking change
Also this I believe it's a breaking change
yeah, I checked that our old engine can handle the queries without issue. The reason is no matter the field value, there is only one type called date in Elasticsearch which is represented by epoch internally. When we developed the new engine, we assume raw data fetched from Elasticsearch is always timestamp. Will revisit this and see how to make it more user friendly meanwhile follow SQL standard as much as possible. Thanks!
Issue may be related: https://github.com/opendistro-for-elasticsearch/sql/issues/510
@dai-chen Shouldn't it fall back to the old one automatically when the query throws?
If not, could we enable that fallback somehow?
@dai-chen Shouldn't it fall back to the old one automatically when the query throws?
If not, could we enable that fallback somehow?
Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.
@dai-chen Shouldn't it fall back to the old one automatically when the query throws? If not, could we enable that fallback somehow?
Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.
Is it possible that both kind of errors are forced to fallback to the old engine? Would it be easy to make a PR?
There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.
Recovering from both error types would get us the best from both worlds and help adoption
@dai-chen Shouldn't it fall back to the old one automatically when the query throws? If not, could we enable that fallback somehow?
Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.
Is it possible that both kind of errors are forced to fallback to the old engine? Would it be easy to make a PR?
There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.
Recovering from both error types would get us the best from both worlds and help adoption
Sure, thanks for helping us find this and other breaking changes! We're looking into all recent issues and will prepare PR for high priority ones.
As additional data point, another valid format in Postgres is 20181231 (opposed to 2018-12-31)
@FreCap sure, I'm working on a PR to support implicit cast. Will reference to MySQL and PostgreSQL. Thanks!
Probably related: https://github.com/opendistro-for-elasticsearch/sql/issues/803
I am also facing the same issue on date ranges, currently using between instead of > < signs to compare dates and strings.
As fix you can use this branch @gauravlanjekar https://github.com/FreCap/sql/tree/nestedLimit-comparison-datecast
You can find the build here: https://github.com/FreCap/sql/actions/runs/1053174786
@FreCap Unfortunaletly I am running opendistro with AWS ES. So I don't have the possibility to run the other branch. :(
Do you see any problem with using between because that is my current workaround until a fix is available.
I tested with OpenSearch 1.1 which can support implicit conversion between Date and String. The following query worked for me:
POST my_test_index_date/_doc/
{
"field1": {
"subFieldA": "2020-02-02"
}
}
POST _plugins/_sql
{
"query": """
SELECT * FROM
my_test_index_date as i
WHERE
i.field1.subFieldA > '2020-02-01 00:00:00'
LIMIT 50;
"""
}
{
"schema": [
{
"name": "field1",
"type": "object"
}
],
"datarows": [
[
{
"subFieldA": "2020-02-02 00:00:00"
}
]
],
"total": 1,
"size": 1,
"status": 200
}
I'm investigating why '2020-02-01' or '2020-02-01T00:00:00Z' returns nothing.
Hello Every One
I am trying to change data type using query workbench I want to change data type date to timestamp using query for output schema in other DB but I am not able to do and opensearch having not supporting like cast kind of query any one can help itβs Urgent
query - select timestamp(datecolumn) as date from table
above query is not working plase help !!!