Implement SQL standard FORMAT clause for CAST between string types and datetime types [CORE6507]
Submitted by: @mrotteveel
Implement SQL standard FORMAT clause for CAST between string types and datetime types, to allow custom formatting of datetime values and conversion from string values with a specific format to datetime values.
""" <cast specification> ::= CAST <left paren> <cast operand> AS <cast target> [ FORMAT <cast template> ] <right paren>
<cast operand> ::= <value expression> | <implicitly typed value specification>
<cast target> ::= <domain name> | <data type>
<cast template> ::= <character string literal> """
Where <cast template> follows the rules of Subclause 9.42, "Converting a datetime to a formatted character string" or Subclause 9.43, "Converting a formatted character string to a datetime". Specific syntax rules defined in Subclause 9.44, "Datetime templates":
""" <datetime template> ::= { <datetime template part> }...
<datetime template part> ::= <datetime template field> | <datetime template delimiter>
<datetime template field> ::= <datetime template year> | <datetime template rounded year> | <datetime template month> | <datetime template day of month> | <datetime template day of year> | <datetime template 12-hour> | <datetime template 24-hour> | <datetime template minute> | <datetime template second of minute> | <datetime template second of day> | <datetime template fraction> | <datetime template am/pm> | <datetime template time zone hour> | <datetime template time zone minute>
<datetime template delimiter> ::= <minus sign> | <period> | <solidus> | <comma> | <apostrophe> | <semicolon> | <colon> | <space>
<datetime template year> ::= YYYY | YYY | YY | Y
<datetime template rounded year> ::= RRRR | RR
<datetime template month> ::= MM
<datetime template day of month> ::= DD
<datetime template day of year> ::= DDD
<datetime template 12-hour> ::= HH | HH12
<datetime template 24-hour> ::= HH24
<datetime template minute> ::= MI
<datetime template second of minute> ::= SS
<datetime template second of day> ::= SSSSS
<datetime template fraction> ::= FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
<datetime template am/pm> ::= A.M. | P.M.
<datetime template time zone hour> ::= TZH
<datetime template time zone minute> ::= TZM """
Commented by: @mrotteveel
@Dmitry, personally I read the original request of CORE1314 more like a request for a message formatter like the following:
``` STRING_FORMAT('Value 1: {0}, value 2: {1}', 'a', 'b') ```
results in
``` 'Value 1: a, value 2: b' ```
The dateformatting was tacked on CORE1314 when CORE1341 was closed, while it should have remained a separate ticket IMHO. These are two different concerns that should be handled separately (string interpolation vs dateformatting).
It would be nice to see this feature in 5.0. (Is there a voting system on GitHUB similar to the previous one?)
@omachtandras You can use a thumbs-up response on the initial comment (through the smiley icon button).
I made a PR #7629 with this feature, and it would be nice if someone would look into it.
Is it possible to use format pattern (specified here as "<cast template>") from variable instead of literal one ?
Suppose that i have following table and put one row in it:
set list on;
recreate table test(
id int generated always as identity constraint pk_test primary key,
dts timestamp with time zone
);
commit;
insert into test(dts) values('23.09.2016 12:34:56.789 Indian/Cocos');
commit;
Following query runs OK:
select cast(dts as varchar(50) format 'dd.mm.year hh24:mi:ss.ff3') as dsql_result from test;
DSQL_RESULT 23.09.2016 12:34:56.789
But if i try to do similar using PSQL block then i get error:
set term ^;
execute block returns( psql_result varchar(50) ) as
declare dts timestamp with time zone = '23.09.2016 12:34:56.789 Indian/Cocos';
declare fmt varchar(50);
begin
fmt = 'dd.mm.year hh24:mi:ss.ff3';
psql_result = cast(dts as varchar(50) format fmt);
suspend;
end
^
set term ;^
Output:
Statement failed, SQLSTATE = 42000
Dynamic SQL Error
-SQL error code = -104
-Token unknown - line 6, column 50
-fmt
Is it possible to use format pattern (specified here as "") from variable instead of literal one ?
No, the SQL standard doesn't support that. It only specifies support for a character string literal (see the quoted syntax rule <cast template>). Part of the problem is that you're thinking of cast as a function, and it is not a function, it is basically a fixed transformation whose entire syntax must be validated when the statement is compiled. That syntax includes the value(!) of the format string. If you'd be able to parametrize that, you can no longer check the syntax at statement compilation.
To be honest, I'm not really happy how the implementation turned out. It has far too many non-standard extension as it is IMHO.
IMHO it's better to have a standard function with non-standard extensions than implement a second non-standard function with these extensions. We have standard EXTRACT with non-standard QUARTER / MILLISECOND / WEEK / WEEKDAY / YEARDAY and nobody complained so far ;-)
IMHO it's better to have a standard function with non-standard extensions than implement a second non-standard function with these extensions. We have standard EXTRACT with non-standard QUARTER / MILLISECOND / WEEK / WEEKDAY / YEARDAY and nobody complained so far ;-)
Sure, but I think some of the extensions that have now been added to the cast-format make it more complicated than necessary, and should only have been implemented when there is a clear demand, not as the initial implementation.
One more Q.
select cast('12:34:57 +01:00' as time with time zone) from rdb$database;
CAST 12:34:57.0000 +01:00
select cast('12:34:57 +01:00' as time with time zone format 'HH24:MI:SS TZH:TZM' ) from rdb$database;
CAST 13:34:57.0000 +01:00
Why presence of FORMAT <cast template> causes somewhat like 'additional shifting' of source timestamp ?
That looks like a bug to me.
That looks like a bug to me.
It seems so. I'm looking at the CVT_string_to_datetime implementation (this is the first query), and it converts local time to UTC, so there's no "shift" happens.
I'll fix it.
There are things in this implementation that without access to the standard I cannot say they are correct or not, for example, missing parts:
select cast('12 -03:00' as timestamp with time zone format 'HH24 TZH:TZM' ) from rdb$database;
Also note that TIME WITH TIMEZONE and region-based timezones is trick in general.
This is covered by 9.51 Converting a formatted character string to a datetime (SQL:2023), to paraphrase the rule: missing date parts take the value of CURRENT_DATE, for missing time parts values take value zero (0) (note if SSSS (second of day) is specified, all other time parts are irrelevant), for missing time zone parts, also: use zero (0).
Correction: the default values for the date parts are taken from CURRENT_TIMESTAMP (not that this should make much of a difference).
@asfernandes As an aside: I highly recommend buying a copy of ISO/IEC 9075-2:2023 (or maybe ask the Firebird Foundation to buy a copy for you).
or maybe ask the Firebird Foundation to buy a copy for you
I asked and the subject didn't moved.
QA issue: see https://github.com/FirebirdSQL/firebird/issues/2388#issuecomment-1792104713 Currently test can not be implemented until this bug(?) will be fixed.
QA issue: see https://github.com/FirebirdSQL/firebird/issues/2388#issuecomment-1792104713 Currently test can not be implemented until this bug(?) will be fixed.
Fixed in #7835.
Perhaps it's more appropriate to post it here rather than in https://github.com/FirebirdSQL/firebird/pull/7835 Sorry for duplicate.
It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '<cast template>':
set heading off;
set echo on;
-- date.time to string:
select cast(current_date as varchar(50) format 'DD.RM.YYYY') from rdb$database;
-- string to date/time:
select cast('24.XI.2023' as date format 'DD.RM.YYYY') from rdb$database;
Output:
-- date.time to string:
select cast(current_date as varchar(50) format 'DD.RM.YYYY') from rdb$database;
24.XI.2023
-- string to date/time:
select cast('24.XI.2023' as date format 'DD.RM.YYYY') from rdb$database;
Statement failed, SQLSTATE = HY000
Value for RM pattern is out of range [1, 12]
One more weird case.
set list on;
set sqlda_display on;
with
a as (
select cast('23:13:40 Europe/Moscow' as time with time zone) as init_tmtz
from rdb$database
UNION ALL
select cast('01:02:03 Europe/Moscow' as time with time zone)
from rdb$database
-- sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
)
,b as (
-- attempt to cast time_with_time_zone to text:
-- doc: ## 1. DATETIME TO STRING SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(50) FORMAT 'YEARMMDD HH24MISS') FROM RDB$DATABASE;
select
init_tmtz
,cast(init_tmtz as varchar(50) format 'HH24:MI:SS TZH:TZM') as tmtz_to_txt
from a
-- sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
)
,c as (
-- attempt to cast text to time_with_timezone
-- doc: ## 2. STRING TO DATETIME: SELECT CAST('2000.12.08 12:35:30.5000' AS TIMESTAMP FORMAT 'YEAR.MM.DD HH24:MI:SS.FF4') FROM RDB$DATABASE;
select
init_tmtz
,tmtz_to_txt
,cast(tmtz_to_txt as time with time zone format 'HH24:MI:SS TZH:TZM') as txt_to_tmtz
from b
-- sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
)
,d as (
-- (again) attempt to cast time_with_time_zone to text:
-- doc: ## 1. DATETIME TO STRING SELECT CAST(CURRENT_TIMESTAMP AS VARCHAR(50) FORMAT 'YEARMMDD HH24MISS') FROM RDB$DATABASE;
select
init_tmtz
,tmtz_to_txt
,txt_to_tmtz
,cast( txt_to_tmtz as varchar(50) format 'HH24:MI:SS TZH:TZM') as tmtz_back_to_txt
from c
-- sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
)
select * from d
;
Output:
01: sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
: name: alias: INIT_TMTZ
: table: owner:
02: sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
: name: CAST alias: TMTZ_TO_TXT
: table: owner:
03: sqltype: 32756 TIME WITH TIME ZONE scale: 0 subtype: 0 len: 8
: name: CAST alias: TXT_TO_TMTZ
: table: owner:
04: sqltype: 448 VARYING scale: 0 subtype: 0 len: 50 charset: 0 NONE
: name: CAST alias: TMTZ_BACK_TO_TXT
: table: owner:
INIT_TMTZ 23:13:40.0000 Europe/Moscow
TMTZ_TO_TXT 23:13:40 +03:00
TXT_TO_TMTZ 23:13:40.0000 +03:00
TMTZ_BACK_TO_TXT 23:13:40 +03:00 -- OK, expected (equals to value in 'TMTZ_TO_TXT' column)
INIT_TMTZ 01:02:03.0000 Europe/Moscow
TMTZ_TO_TXT 01:02:03 +03:00
TXT_TO_TMTZ 00:20:19.7296 +03:00 ------------------------ <<<<< ?! 8-O ?!
TMTZ_BACK_TO_TXT 00:20:19 +03:00
PS. Prototype for automated test which checks all possible combinations of formats and time zones (defined in FB) shows that there are 429 such "oddities" (but other 3148 cases pass this check :))
It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '
':
There is no such format string defined in the SQL standard. Why is this even implemented?
One more weird case.
Can you be more explicit in describing the problem? You have marked it with <<<<< ?! 8-O ?!, but I have no clue as to what you mean with that.
I have no clue as to what you mean with that.
I mean that some data transformation show wrong result (or, it difficult to understand - at least for me). Just try to run these commands:
C:\Users\PashaZ>isql localhost:employee -z
ISQL Version: WI-T6.0.0.150 Firebird 6.0 Initial
Server version:
WI-T6.0.0.150 Firebird 6.0 Initial
WI-T6.0.0.150 Firebird 6.0 Initial/tcp (Image-PC1)/P19:C
WI-T6.0.0.150 Firebird 6.0 Initial/tcp (Image-PC1)/P19:C
Database: localhost:employee, User: SYSDBA
SQL> set list on;
SQL> /* attempt to cast time_with_time_zone to text */;
SQL>
SQL> select cast( cast('01:02:03 Europe/Moscow' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM') from rdb$database;
CAST 01 AM:02:03 +03:00
SQL> /* attempt to cast text to time_with_timezone */;
SQL>
SQL> select cast('01 AM:02:03 +03:00' as time with time zone format 'HH:MI:SS TZH:TZM') from rdb$database;
CAST 00:20:19.7296 +03:00
SQL> /* (again) attempt to cast time_with_time_zone to text */
CON> ;
SQL> select cast( cast('00:20:19.7296 +03:00' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM') from rdb$database;
CAST 12 AM:20:19 +03:00
I can not understand from where "00:20:19.7296 +03:00" comes.
The first step is already wrong to begin with. The
cast('01:02:03 Europe/Moscow' as time with time zone) as varchar(50) format 'HH:MI:SS TZH:TZM')
should result in
01:02:03 +03:00
The "result" you expect should only be produced if you'd use the format HH A.M.:MI:SS TZH:TZM or HH P.M.:MI:SS TZH:TZM (assuming that is even valid to use in that order, I didn't look to close to the syntax rules of SQL:2023).
In other words, garbage in = garbage out, but the bigger problem is that this is seemingly accepted without resulting in an error in your second step.
It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '':
There is no such format string defined in the SQL standard. Why is this even implemented?
Because it exists in PostgreSQL, Oracle and MSSQL (at least).
That is
cast('01 AM:02:03 +03:00' as time with time zone format 'HH:MI:SS TZH:TZM')
Should result in an error, because the AM part doesn't match the format (it doesn't have the A.M or P.M. format marker, though I guess those would only match A.M or P.M. and not AM or PM).
Specifically the rule for AM/PM is (in SQL:2023-2, section 9.51 Converting a formatted character string to a datetime):
ii) If DTPi is a <datetime template am/pm>, then let REi be the <character string literal> '((A|P).M.)'
where:
<datetime template am/pm> ::= A.M. | P.M.
It seems that 'RM' (abbreviation for Roman numbers) can not be used for cast from string to date in '':
There is no such format string defined in the SQL standard. Why is this even implemented?
Because it exists in PostgreSQL, Oracle and MSSQL (at least).
And I think that is a problem. Why not start with the SQL:2023 format as baseline and only add things when requested (who even uses Roman numerals in months?), instead of gold-plating it from the start.
@pavel-zotov Could you make sure that your tests verify compliance with sections 9.50, 9.51 and 9.52 of SQL:2023-2, instead of only verifying that it does what the implementation says it does (as the implementation in #7629 actually implemented HH and HH12 wrong by including AM/PM, and not implementing <datetime template am/pm>).