sqlparse
sqlparse copied to clipboard
Keywords take precedence over Names
Ciao, I'm having a problem formatting a query with identifier_case = upper.
It seems that the formatter cannot correctly recognize the name of an entity if its value is present in the list of reserved KEYWORDS.
This behaviour, even if does not compromise the execution of the SQL query on a DB (like Snowflake in my case), has unwanted side effects in other supporting libraries (in my case with sql_metadata).
For example:
JOIN position as po ON
position
is recognized as ttype = Token.Keyword
, but in the context of the query it refers to the table name, keeping its value unchanged in lowercase.
But po
is recognized as a ttype = Token.Name
and transformed into PO
.
Input query:
SELECT
p2.ID
, p2.email
, t.ID AS ticket_id
, NULLIF (TRIM (LOWER (t.blabla)), '') as blabla
, NULLIF (TRIM (LOWER (t.blabla2)), '') as "blabla2"
, MyDb.Schema2. "Ticket". *
, po.role
, po.role AS "role1"
, po.role AS role2
, po2.blabla AS "Bl1"
FROM MYDB.schema2.person AS p2
INNER JOIN "MYDB" .SCHEMA2. "Ticket" AS t ON
t.person_id = p2.ID
JOIN position as po ON
po.ID = t.position_id
INNER JOIN position2 as po2 ON
po.ID = t.position_id
WHERE 1
Result:
SELECT
P2.ID
, P2.EMAIL
, T.ID AS TICKET_ID
, NULLIF (TRIM (LOWER (T.BLABLA)), '') as BLABLA
, NULLIF (TRIM (LOWER (T.BLABLA2)), '') as "blabla2"
, MYDB.SCHEMA2. "Ticket". *
, PO.ROLE
, PO.ROLE AS "role1"
, PO.ROLE AS ROLE2
, PO2.BLABLA AS "Bl1"
FROM MYDB.SCHEMA2.PERSON AS P2
INNER JOIN "MYDB" .SCHEMA2. "Ticket" AS T ON
T.PERSON_ID = P2.ID
JOIN position as PO ON
PO.ID = T.POSITION_ID
INNER JOIN POSITION2 as PO2 ON
PO.ID = T.POSITION_ID
WHERE 1
This anomaly leads the sql_metadata
library to generate an incorrect list of columns, as in Snowflake an entity not enclosed in ""
is converted into the corresponded uppercased string.
formatted_query = sqlparse.format(query, **cfg)
parsed = Parser(formatted_query)
print(parsed.columns)
>> ['MYDB.SCHEMA2.PERSON.ID', 'MYDB.SCHEMA2.PERSON.EMAIL', 'MYDB.SCHEMA2.ticket.ID', 'MYDB.SCHEMA2.ticket.BLABLA', 'MYDB.SCHEMA2.ticket. BLABLA2 ', 'MYDB.SCHEMA2.ticket. *', 'Position.ROLE', 'POSITION2.BLABLA', 'MYDB.SCHEMA2.ticket.PERSON_ID', 'position.ID', 'MYDB.SCHEMA2.ticket.POSITION_ID']
In my case in fact I can't find any entity in my data catalog (managed externally to Snowflake) with the name position
in lowercase, because it is saved as POSITION
.
data_catalog_known_tables = ['PERSON', 'ticket', 'POSITION', 'POSITION2']
for column in parsed.columns:
table_name = column.split ('.') [- 2]
if not table_name in data_catalog_known_tables:
raise Exception (f "Table \" {table_name} \ "not found")
It seems that POSITION is an actual keyword. See here: https://github.com/andialbrecht/sqlparse/blob/master/sqlparse/keywords.py#L455 with my latest PR #693 you will be able to change the keywords to prevent this in your specific case. I don't know enough about sql syntax to know if this should be handled more generally.