Too long numeric hex literal silently converted to value 0 with an alias
If a numeric hex literal is too long (more than 32 hex digits in Firebird 4.0 and 5.0, more than 16 hex digits in Firebird 3.0), it is silently converted to the value 0 with an alias equal to the remainder of the literal.
For example
select 0x12345678901234567890123456789012, 0x123456789012345678901234567890123 from rdb$database;
CONSTANT X123456789012345678901234567890123
============================================= ==================================
24197857199965561741520400062332047378 0
(first column is a valid INT128 literal in Firebird 4.0 and higher with 32 digits, while the second is invalid, with 33 digits)
This should raise an error instead.
NOTE: This seems to be a generic parser issue: for example select 0alias from rdb$database has the same problem. This should not be accepted either, but results in a column with alias alias and value 0.
For comparison, for select 0alias PostgreSQL raises:
ERROR: trailing junk after numeric literal at or near "0a"
LINE 1: select 0alias
I.e. it's definitely bug, not a feature?
The 0x123456789012345678901234567890123 should be handled as a single token, and then fail because it is too long for hex literals. Same goes for 0alias, it should be a single token which produces an error because it is not a valid literal or other type of expression (e.g. it is an invalid identifier, because regular identifiers cannot start with a number, nor is it a valid number literal).
OK, will take a look at it
NOTE: This seems to be a generic parser issue: for example
select 0alias from rdb$databasehas the same problem. This should not be accepted either, but results in a column with aliasaliasand value0.
I fully disagree with you here.
It's so valid as is the expression 1+2. Spaces are not required.
The lexer consumes characters as the current token accepts them and stops when not.
For the record, select 0alias from dual is also valid in Oracle.
I fully disagree with you here.
It's so valid as is the expression
1+2. Spaces are not required.The lexer consumes characters as the current token accepts them and stops when not.
In the expression 1+2, the + is a delimiter token and 1 and 2 are nondelimiter tokens, see 5.2 <token> and <separator> of SQL:2016-2, and the rule:
7) Any <token> may be followed by a <separator>. A <nondelimiter token> shall be followed by a <delimiter token> or a <separator>. NOTE 130 — If the Format does not allow a <nondelimiter token> to be followed by a <delimiter token>, then that <nondelimiter token> shall be followed by a <separator>.
Following those rules, things like 0x123456789012345678901234567890123 and 0alias are not allowed to be parsed as if they are 0 x123456789012345678901234567890123 or 0 alias, because neither 0 nor x123456789012345678901234567890123 or alias are delimiter tokens: they are nondelimiter tokens, so they must be separated by a separator (comment or whitespace), or a delimiter token, but that doesn't apply in this example.
For the record, select 0alias from dual is also valid in Oracle.
Then Oracle does it wrong too.
In the expression
1+2, the+is a delimiter token and1and2are nondelimiter tokens, see 5.2 <token> and <separator> of SQL:2016-2, and the rule:
- Any <token> may be followed by a <separator>. A <nondelimiter token> shall be followed by a <delimiter token> or a <separator>. NOTE 130 — If the Format does not allow a <nondelimiter token> to be followed by a <delimiter token>, then that <nondelimiter token> shall be followed by a <separator>.
Ok, but then choose to follow the standard for no good reason and break applications. I see no need for that.
The main issue could and should be fixed and actually this part has nothing to do with it.
Ok, but then choose to follow the standard for no good reason and break applications. I see no need for that.
I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like select 0alias from atable and expect it not to produce an error.
The main issue could and should be fixed and actually this part has nothing to do with it.
I disagree. The rule I quote has everything to do with it: if the rule would be followed then even the current tokenization (which for 0x123456789012345678901234567890123 yields token 0 and token x123456789012345678901234567890123, and for 0alias yields token 0 and token alias) would have resulted in an error because these are nondelimiter tokens and thus are not allowed to be next to each other without a separator (comment or whitespace).
I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like
select 0alias from atableand expect it not to produce an error.
I already received the same question by people, "why this works"? So, people write it.
For me it's nonsense like things 1+2+3 without spaces, and yes, people write it (a lot).
If you wanted to write an arithmetical nonsense you should use 1--2+-3 as the example.
23.05.2023 13:40, Adriano dos Santos Fernandes wrote:
I already received the same question by people, "why this works"? So, people write it.
But the question itself means they also consider this abnormal and thus likely to agree with this to be fixed.
People used longer-than-31-character aliases for years, but we broke it (for a reason). Making the lexer rules more consistent may be a good reason too. Just my 2 cents.
For me it's nonsense like things |1+2+3| without spaces, and yes, people write it (a lot).
This is not a nonsense for me, although I never write expressions this way myself. For me it's just a readability thing like "if(expr) -> if (expr)". But something like if(expr)then1else0 would surely be nonsense.
Dmitry
I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like
select 0alias from atableand expect it not to produce an error.I already received the same question by people, "why this works"? So, people write it.
You're proving my point that they would expect it to produce an error.
For me it's nonsense like things
1+2+3without spaces, and yes, people write it (a lot).
And following the rule I quoted that will continue to work, because + is a delimiter token, and thus is allowed to occur between two nondelimiter tokens without a separator.
On 5/23/23 13:52, Dmitry Yemanov wrote:
But something like if(expr)then1else0 would surely be nonsense.
Yes, and for me treating 0alias as 2 separate tokens appears to be nonsense too. The 2 known to me possible excuses for keeping it in parser are:
- matches standard (thks to Mark, we know it does not match)
- is actively used by people.
Adriano, when you say that people asked you about it - may be they mistyped and were surprised that it works?
Adriano, when you say that people asked you about it - may be they mistyped and were surprised that it works?
I don't know.
It may be someone reading an existing working code and surprised by it.
My point is that as it also works in Oracle, it's not so uncommon.
It's not standard compliant, but if we are going to change every piece that not follows the standard, our users will be in problem.
This issue is probably fixed by #8564 (not yet merged)