sql icon indicating copy to clipboard operation
sql copied to clipboard

Comparing date, timestamp and string lead to failures and Date formatted as timestamp

Open FreCap opened this issue 4 years ago β€’ 16 comments

Version 1.13 (might be a regression from previous versions)

The following query:

POST my_test_index_date/_doc/
{
  "field1": {
    "subFieldA": "2020-02-02"
  }
}

generates the following mapping specifying the field as DATE:

{
  "mappings": {
    "_doc": {
      "properties": {
        "field1": {
          "properties": {
            "subFieldA": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

When I try to query it in different combinations that I feel should probably be acceptable, it always fail:

Case 1

POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  i.field1.subFieldA > '2020-02-01'
    LIMIT 50;"""
}

Response -->
{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,STRING]",
    "type": "ExpressionEvaluationException"
  },
  "status": 503
}

Why the field is considered TIMESTAMP here instead of DATE (like in the mapping)?

It'd be very nice to have the two types (TIMESTAMP and STRING) comparable since we don't want the user to always know what kind of Date it is in the mapping

Case 2


POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  i.field1.subFieldA > DATE('2020-02-01')
    LIMIT 50;"""
}

Response -->
{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,DATE]",
    "type": "ExpressionEvaluationException"
  },
  "status": 503
}

Shouldn't TIMESTAMP and DATE be comparable?

Case 3

POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  (i.field1.subFieldA) > TIMESTAMP('2020-02-01')
    LIMIT 50;"""
}

Response --> timestamp parse error
{
  "error": {
    "type": "SearchPhaseExecutionException",
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": "Shard[0]: NotSerializableExceptionWrapper[semantic_check_exception: timestamp:2020-02-01 in unsupported format, please use yyyy-MM-dd HH:mm:ss[.SSSSSS]]\n\nFor more details, please send request for Json format to see the raw response from elasticsearch engine."
  },
  "status": 503
}

Case 4, working, but unexpected output


POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  DATE(i.field1.subFieldA) > DATE('2020-02-01')
    LIMIT 50;"""
}

Response --> DATE in Timestamp format instead of DATE format like in mapping
{
  "schema": [
    {
      "name": "field1",
      "type": "object"
    }
  ],
  "datarows": [
    [
      {
        "subFieldA": "2020-02-02 00:00:00"
      }
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

FreCap avatar Feb 22 '21 13:02 FreCap

similar issue in opendistrocommunity. https://discuss.opendistrocommunity.dev/t/sql-plugin-date-filter-causes-error/4883

penghuo avatar Feb 24 '21 18:02 penghuo

Also this I believe it's a breaking change

FreCap avatar Feb 24 '21 22:02 FreCap

Also this I believe it's a breaking change

yeah, I checked that our old engine can handle the queries without issue. The reason is no matter the field value, there is only one type called date in Elasticsearch which is represented by epoch internally. When we developed the new engine, we assume raw data fetched from Elasticsearch is always timestamp. Will revisit this and see how to make it more user friendly meanwhile follow SQL standard as much as possible. Thanks!

dai-chen avatar Mar 01 '21 23:03 dai-chen

Issue may be related: https://github.com/opendistro-for-elasticsearch/sql/issues/510

dai-chen avatar Mar 01 '21 23:03 dai-chen

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?

If not, could we enable that fallback somehow?

FreCap avatar Mar 02 '21 01:03 FreCap

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?

If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

dai-chen avatar Mar 02 '21 18:03 dai-chen

@dai-chen Shouldn't it fall back to the old one automatically when the query throws? If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

Is it possible that both kind of errors are forced to fallback to the old engine? Would it be easy to make a PR?

There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.

Recovering from both error types would get us the best from both worlds and help adoption

FreCap avatar Mar 03 '21 13:03 FreCap

@dai-chen Shouldn't it fall back to the old one automatically when the query throws? If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

Is it possible that both kind of errors are forced to fallback to the old engine? Would it be easy to make a PR?

There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.

Recovering from both error types would get us the best from both worlds and help adoption

Sure, thanks for helping us find this and other breaking changes! We're looking into all recent issues and will prepare PR for high priority ones.

dai-chen avatar Mar 03 '21 16:03 dai-chen

As additional data point, another valid format in Postgres is 20181231 (opposed to 2018-12-31)

FreCap avatar Mar 08 '21 12:03 FreCap

@FreCap sure, I'm working on a PR to support implicit cast. Will reference to MySQL and PostgreSQL. Thanks!

dai-chen avatar Mar 11 '21 18:03 dai-chen

Probably related: https://github.com/opendistro-for-elasticsearch/sql/issues/803

dai-chen avatar Mar 11 '21 18:03 dai-chen

I am also facing the same issue on date ranges, currently using between instead of > < signs to compare dates and strings.

gauravlanjekar avatar Jul 21 '21 22:07 gauravlanjekar

As fix you can use this branch @gauravlanjekar https://github.com/FreCap/sql/tree/nestedLimit-comparison-datecast

You can find the build here: https://github.com/FreCap/sql/actions/runs/1053174786

FreCap avatar Jul 22 '21 13:07 FreCap

@FreCap Unfortunaletly I am running opendistro with AWS ES. So I don't have the possibility to run the other branch. :( Do you see any problem with using between because that is my current workaround until a fix is available.

gauravlanjekar avatar Jul 22 '21 14:07 gauravlanjekar

I tested with OpenSearch 1.1 which can support implicit conversion between Date and String. The following query worked for me:

POST my_test_index_date/_doc/
{
  "field1": {
    "subFieldA": "2020-02-02"
  }
}

POST _plugins/_sql
{
  "query": """
    SELECT * FROM
      my_test_index_date as i 
    WHERE 
      i.field1.subFieldA > '2020-02-01 00:00:00'
    LIMIT 50;
  """
}
{
  "schema": [
    {
      "name": "field1",
      "type": "object"
    }
  ],
  "datarows": [
    [
      {
        "subFieldA": "2020-02-02 00:00:00"
      }
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

I'm investigating why '2020-02-01' or '2020-02-01T00:00:00Z' returns nothing.

dai-chen avatar Apr 05 '22 19:04 dai-chen

Hello Every One

I am trying to change data type using query workbench I want to change data type date to timestamp using query for output schema in other DB but I am not able to do and opensearch having not supporting like cast kind of query any one can help it’s Urgent

query - select timestamp(datecolumn) as date from table

above query is not working plase help !!!

dharminfadia avatar Jun 09 '22 09:06 dharminfadia