firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Implement SQL standard FORMAT clause for CAST between string types and datetime types [CORE6507]

Open firebird-automations opened this issue 4 years ago • 64 comments

Submitted by: @mrotteveel

Implement SQL standard FORMAT clause for CAST between string types and datetime types, to allow custom formatting of datetime values and conversion from string values with a specific format to datetime values.

""" <cast specification> ::= CAST <left paren> <cast operand> AS <cast target> [ FORMAT <cast template> ] <right paren>

<cast operand> ::= <value expression> | <implicitly typed value specification>

<cast target> ::= <domain name> | <data type>

<cast template> ::= <character string literal> """

Where <cast template> follows the rules of Subclause 9.42, "Converting a datetime to a formatted character string" or Subclause 9.43, "Converting a formatted character string to a datetime". Specific syntax rules defined in Subclause 9.44, "Datetime templates":

""" <datetime template> ::= { <datetime template part> }...

<datetime template part> ::= <datetime template field> | <datetime template delimiter>

<datetime template field> ::= <datetime template year> | <datetime template rounded year> | <datetime template month> | <datetime template day of month> | <datetime template day of year> | <datetime template 12-hour> | <datetime template 24-hour> | <datetime template minute> | <datetime template second of minute> | <datetime template second of day> | <datetime template fraction> | <datetime template am/pm> | <datetime template time zone hour> | <datetime template time zone minute>

<datetime template delimiter> ::= <minus sign> | <period> | <solidus> | <comma> | <apostrophe> | <semicolon> | <colon> | <space>

<datetime template year> ::= YYYY | YYY | YY | Y

<datetime template rounded year> ::= RRRR | RR

<datetime template month> ::= MM

<datetime template day of month> ::= DD

<datetime template day of year> ::= DDD

<datetime template 12-hour> ::= HH | HH12

<datetime template 24-hour> ::= HH24

<datetime template minute> ::= MI

<datetime template second of minute> ::= SS

<datetime template second of day> ::= SSSSS

<datetime template fraction> ::= FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9

<datetime template am/pm> ::= A.M. | P.M.

<datetime template time zone hour> ::= TZH

<datetime template time zone minute> ::= TZM """

firebird-automations avatar Mar 08 '21 18:03 firebird-automations

Commented by: @dyemanov

Replacement for CORE1314?

firebird-automations avatar Mar 08 '21 20:03 firebird-automations

Commented by: @mrotteveel

@Dmitry, personally I read the original request of CORE1314 more like a request for a message formatter like the following:

``` STRING_FORMAT('Value 1: {0}, value 2: {1}', 'a', 'b') ```

results in

``` 'Value 1: a, value 2: b' ```

The dateformatting was tacked on CORE1314 when CORE1341 was closed, while it should have remained a separate ticket IMHO. These are two different concerns that should be handled separately (string interpolation vs dateformatting).

firebird-automations avatar Mar 08 '21 21:03 firebird-automations

It would be nice to see this feature in 5.0. (Is there a voting system on GitHUB similar to the previous one?)

omachtandras avatar May 20 '21 07:05 omachtandras

@omachtandras You can use a thumbs-up response on the initial comment (through the smiley icon button).

mrotteveel avatar May 20 '21 08:05 mrotteveel

I made a PR #7629 with this feature, and it would be nice if someone would look into it.

TreeHunter9 avatar Jun 16 '23 07:06 TreeHunter9

Is it possible to use format pattern (specified here as "<cast template>") from variable instead of literal one ?

Suppose that i have following table and put one row in it:

set list on;
recreate table test(
    id int generated always as identity constraint pk_test primary key,
    dts timestamp with time zone
);
commit;
 
insert into test(dts) values('23.09.2016 12:34:56.789 Indian/Cocos');
commit;

Following query runs OK:

select cast(dts as varchar(50) format 'dd.mm.year hh24:mi:ss.ff3') as dsql_result from test;
DSQL_RESULT                     23.09.2016 12:34:56.789

But if i try to do similar using PSQL block then i get error:

set term ^;
execute block returns( psql_result varchar(50) ) as
   declare dts timestamp with time zone = '23.09.2016 12:34:56.789 Indian/Cocos';
   declare fmt varchar(50);
begin
   fmt = 'dd.mm.year hh24:mi:ss.ff3';
   psql_result = cast(dts as varchar(50) format fmt);
   suspend;
end
^
set term ;^

Output:

Statement failed, SQLSTATE = 42000
Dynamic SQL Error
-SQL error code = -104
-Token unknown - line 6, column 50
-fmt

pavel-zotov avatar Nov 02 '23 10:11 pavel-zotov

Is it possible to use format pattern (specified here as "") from variable instead of literal one ?

No, the SQL standard doesn't support that. It only specifies support for a character string literal (see the quoted syntax rule <cast template>). Part of the problem is that you're thinking of cast as a function, and it is not a function, it is basically a fixed transformation whose entire syntax must be validated when the statement is compiled. That syntax includes the value(!) of the format string. If you'd be able to parametrize that, you can no longer check the syntax at statement compilation.

To be honest, I'm not really happy how the implementation turned out. It has far too many non-standard extension as it is IMHO.

mrotteveel avatar Nov 02 '23 11:11 mrotteveel

IMHO it's better to have a standard function with non-standard extensions than implement a second non-standard function with these extensions. We have standard EXTRACT with non-standard QUARTER / MILLISECOND / WEEK / WEEKDAY / YEARDAY and nobody complained so far ;-)

dyemanov avatar Nov 02 '23 11:11 dyemanov

IMHO it's better to have a standard function with non-standard extensions than implement a second non-standard function with these extensions. We have standard EXTRACT with non-standard QUARTER / MILLISECOND / WEEK / WEEKDAY / YEARDAY and nobody complained so far ;-)

Sure, but I think some of the extensions that have now been added to the cast-format make it more complicated than necessary, and should only have been implemented when there is a clear demand, not as the initial implementation.

mrotteveel avatar Nov 02 '23 11:11 mrotteveel

One more Q.

select cast('12:34:57 +01:00' as time with time zone) from rdb$database;
CAST                            12:34:57.0000 +01:00

select cast('12:34:57 +01:00' as time with time zone format 'HH24:MI:SS TZH:TZM' ) from rdb$database;
CAST                            13:34:57.0000 +01:00

Why presence of FORMAT <cast template> causes somewhat like 'additional shifting' of source timestamp ?

pavel-zotov avatar Nov 03 '23 09:11 pavel-zotov

That looks like a bug to me.

mrotteveel avatar Nov 03 '23 09:11 mrotteveel

That looks like a bug to me.

It seems so. I'm looking at the CVT_string_to_datetime implementation (this is the first query), and it converts local time to UTC, so there's no "shift" happens. I'll fix it.

TreeHunter9 avatar Nov 03 '23 13:11 TreeHunter9

There are things in this implementation that without access to the standard I cannot say they are correct or not, for example, missing parts:

select cast('12 -03:00' as timestamp with time zone format 'HH24 TZH:TZM' ) from rdb$database;

Also note that TIME WITH TIMEZONE and region-based timezones is trick in general.

asfernandes avatar Nov 03 '23 14:11 asfernandes

This is covered by 9.51 Converting a formatted character string to a datetime (SQL:2023), to paraphrase the rule: missing date parts take the value of CURRENT_DATE, for missing time parts values take value zero (0) (note if SSSS (second of day) is specified, all other time parts are irrelevant), for missing time zone parts, also: use zero (0).

mrotteveel avatar Nov 03 '23 14:11 mrotteveel

Correction: the default values for the date parts are taken from CURRENT_TIMESTAMP (not that this should make much of a difference).

@asfernandes As an aside: I highly recommend buying a copy of ISO/IEC 9075-2:2023 (or maybe ask the Firebird Foundation to buy a copy for you).

mrotteveel avatar Nov 03 '23 14:11 mrotteveel

or maybe ask the Firebird Foundation to buy a copy for you

I asked and the subject didn't moved.

asfernandes avatar Nov 03 '23 14:11 asfernandes

QA issue: see https://github.com/FirebirdSQL/firebird/issues/2388#issuecomment-1792104713 Currently test can not be implemented until this bug(?) will be fixed.

pavel-zotov avatar Nov 22 '23 10:11 pavel-zotov

QA issue: see https://github.com/FirebirdSQL/firebird/issues/2388#issuecomment-1792104713 Currently test can not be implemented until this bug(?) will be fixed.

Fixed in #7835.

TreeHunter9 avatar Nov 23 '23 12:11 TreeHunter9

Perhaps it's more appropriate to post it here rather than in https://github.com/FirebirdSQL/firebird/pull/7835 Sorry for duplicate.

It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '<cast template>':

set heading off;
set echo on;

-- date.time to string:
select cast(current_date as varchar(50) format 'DD.RM.YYYY') from rdb$database;

-- string to date/time:
select cast('24.XI.2023' as date format 'DD.RM.YYYY') from rdb$database;

Output:

-- date.time to string:
select cast(current_date as varchar(50) format 'DD.RM.YYYY') from rdb$database;
24.XI.2023

-- string to date/time:
select cast('24.XI.2023' as date format 'DD.RM.YYYY') from rdb$database;

Statement failed, SQLSTATE = HY000
Value for RM pattern is out of range [1, 12]

pavel-zotov avatar Nov 24 '23 18:11 pavel-zotov

One more weird case.

set list on;
set sqlda_display on;
with 
a as (

    select cast('23:13:40 Europe/Moscow' as time with time zone) as init_tmtz
    from rdb$database
    UNION ALL
    select cast('01:02:03 Europe/Moscow' as time with time zone)
    from rdb$database

    -- sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
)
,b as (
    -- attempt to cast time_with_time_zone to text:
    -- doc: ## 1. DATETIME TO STRING SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(50) FORMAT 'YEARMMDD HH24MISS') FROM RDB$DATABASE;
    select
        init_tmtz
        ,cast(init_tmtz as varchar(50) format 'HH24:MI:SS TZH:TZM') as tmtz_to_txt
    from a
     -- sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
)
,c as (
    -- attempt to cast text to time_with_timezone
    -- doc: ## 2. STRING TO DATETIME: SELECT CAST('2000.12.08 12:35:30.5000' AS TIMESTAMP FORMAT 'YEAR.MM.DD HH24:MI:SS.FF4') FROM RDB$DATABASE;
    select 
        init_tmtz
        ,tmtz_to_txt
        ,cast(tmtz_to_txt as time with time zone format 'HH24:MI:SS TZH:TZM') as txt_to_tmtz
    from b
    -- sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
)

,d as (
    -- (again) attempt to cast time_with_time_zone to text:
    -- doc: ## 1. DATETIME TO STRING SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(50) FORMAT 'YEARMMDD HH24MISS') FROM RDB$DATABASE;
    select
        init_tmtz
       ,tmtz_to_txt
       ,txt_to_tmtz
       ,cast( txt_to_tmtz as varchar(50) format 'HH24:MI:SS TZH:TZM') as tmtz_back_to_txt
    from c
    -- sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
)
select * from d
;

Output:

01: sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
  :  name:   alias: INIT_TMTZ
  : table:   owner:
02: sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
  :  name: CAST  alias: TMTZ_TO_TXT
  : table:   owner:
03: sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
  :  name: CAST  alias: TXT_TO_TMTZ
  : table:   owner:
04: sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
  :  name: CAST  alias: TMTZ_BACK_TO_TXT
  : table:   owner:

INIT_TMTZ                       23:13:40.0000 Europe/Moscow
TMTZ_TO_TXT                     23:13:40 +03:00
TXT_TO_TMTZ                     23:13:40.0000 +03:00
TMTZ_BACK_TO_TXT                23:13:40 +03:00 -- OK, expected (equals to value in 'TMTZ_TO_TXT' column)

INIT_TMTZ                       01:02:03.0000 Europe/Moscow
TMTZ_TO_TXT                     01:02:03 +03:00
TXT_TO_TMTZ                     00:20:19.7296 +03:00 ------------------------ <<<<< ?! 8-O ?!
TMTZ_BACK_TO_TXT                00:20:19 +03:00

PS. Prototype for automated test which checks all possible combinations of formats and time zones (defined in FB) shows that there are 429 such "oddities" (but other 3148 cases pass this check :))

pavel-zotov avatar Nov 24 '23 21:11 pavel-zotov

It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '':

There is no such format string defined in the SQL standard. Why is this even implemented?

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

One more weird case.

Can you be more explicit in describing the problem? You have marked it with <<<<< ?! 8-O ?!, but I have no clue as to what you mean with that.

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

I have no clue as to what you mean with that.

I mean that some data transformation show wrong result (or, it difficult to understand - at least for me). Just try to run these commands:

C:\Users\PashaZ>isql localhost:employee -z
ISQL Version: WI-T6.0.0.150 Firebird 6.0 Initial
Server version:
WI-T6.0.0.150 Firebird 6.0 Initial
WI-T6.0.0.150 Firebird 6.0 Initial/tcp (Image-PC1)/P19:C
WI-T6.0.0.150 Firebird 6.0 Initial/tcp (Image-PC1)/P19:C
Database: localhost:employee, User: SYSDBA
SQL> set list on;
SQL> /* attempt to cast time_with_time_zone to text */;
SQL>
SQL> select cast( cast('01:02:03 Europe/Moscow' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM') from rdb$database;

CAST                            01 AM:02:03 +03:00


SQL> /* attempt to cast text to time_with_timezone */;
SQL>
SQL> select cast('01 AM:02:03 +03:00' as time with time zone format 'HH:MI:SS TZH:TZM') from rdb$database;

CAST                            00:20:19.7296 +03:00


SQL> /* (again) attempt to cast time_with_time_zone to text */
CON> ;
SQL> select cast( cast('00:20:19.7296 +03:00' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM') from rdb$database;

CAST                            12 AM:20:19 +03:00

I can not understand from where "00:20:19.7296 +03:00" comes. gh-2388-fail-example

pavel-zotov avatar Nov 25 '23 08:11 pavel-zotov

The first step is already wrong to begin with. The

cast('01:02:03 Europe/Moscow' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM')

should result in

01:02:03 +03:00

The "result" you expect should only be produced if you'd use the format HH A.M.:MI:SS TZH:TZM or HH P.M.:MI:SS TZH:TZM (assuming that is even valid to use in that order, I didn't look to close to the syntax rules of SQL:2023).

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

In other words, garbage in = garbage out, but the bigger problem is that this is seemingly accepted without resulting in an error in your second step.

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '':

There is no such format string defined in the SQL standard. Why is this even implemented?

Because it exists in PostgreSQL, Oracle and MSSQL (at least).

dyemanov avatar Nov 25 '23 08:11 dyemanov

That is

cast('01 AM:02:03 +03:00' as time with time zone format 'HH:MI:SS TZH:TZM')

Should result in an error, because the AM part doesn't match the format (it doesn't have the A.M or P.M. format marker, though I guess those would only match A.M or P.M. and not AM or PM).

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

Specifically the rule for AM/PM is (in SQL:2023-2, section 9.51 Converting a formatted character string to a datetime):

ii) If DTPi is a <datetime template am/pm>, then let REi be the <character string literal> '((A|P).M.)'

where:

<datetime template am/pm> ::=
  A.M. | P.M.

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '':

There is no such format string defined in the SQL standard. Why is this even implemented?

Because it exists in PostgreSQL, Oracle and MSSQL (at least).

And I think that is a problem. Why not start with the SQL:2023 format as baseline and only add things when requested (who even uses Roman numerals in months?), instead of gold-plating it from the start.

mrotteveel avatar Nov 25 '23 08:11 mrotteveel

@pavel-zotov Could you make sure that your tests verify compliance with sections 9.50, 9.51 and 9.52 of SQL:2023-2, instead of only verifying that it does what the implementation says it does (as the implementation in #7629 actually implemented HH and HH12 wrong by including AM/PM, and not implementing <datetime template am/pm>).

mrotteveel avatar Nov 25 '23 08:11 mrotteveel