sqlparse icon indicating copy to clipboard operation
sqlparse copied to clipboard

Keywords take precedence over Names

Open bigluck opened this issue 2 years ago • 1 comments

Ciao, I'm having a problem formatting a query with identifier_case = upper.

It seems that the formatter cannot correctly recognize the name of an entity if its value is present in the list of reserved KEYWORDS.

This behaviour, even if does not compromise the execution of the SQL query on a DB (like Snowflake in my case), has unwanted side effects in other supporting libraries (in my case with sql_metadata).

For example:

JOIN position as po ON

position is recognized as ttype = Token.Keyword, but in the context of the query it refers to the table name, keeping its value unchanged in lowercase.

TokenKeyword

But po is recognized as a ttype = Token.Name and transformed into PO.

Input query:

SELECT
      p2.ID
    , p2.email
    , t.ID AS ticket_id
    , NULLIF (TRIM (LOWER (t.blabla)), '') as blabla
    , NULLIF (TRIM (LOWER (t.blabla2)), '') as "blabla2"
    , MyDb.Schema2. "Ticket". *
    , po.role
    , po.role AS "role1"
    , po.role AS role2
    , po2.blabla AS "Bl1"
FROM MYDB.schema2.person AS p2
INNER JOIN "MYDB" .SCHEMA2. "Ticket" AS t ON
    t.person_id = p2.ID
JOIN position as po ON
    po.ID = t.position_id
INNER JOIN position2 as po2 ON
    po.ID = t.position_id
WHERE 1

Result:

SELECT
      P2.ID
    , P2.EMAIL
    , T.ID AS TICKET_ID
    , NULLIF (TRIM (LOWER (T.BLABLA)), '') as BLABLA
    , NULLIF (TRIM (LOWER (T.BLABLA2)), '') as "blabla2"
    , MYDB.SCHEMA2. "Ticket". *
    , PO.ROLE
    , PO.ROLE AS "role1"
    , PO.ROLE AS ROLE2
    , PO2.BLABLA AS "Bl1"
FROM MYDB.SCHEMA2.PERSON AS P2
INNER JOIN "MYDB" .SCHEMA2. "Ticket" AS T ON
    T.PERSON_ID = P2.ID
JOIN position as PO ON
    PO.ID = T.POSITION_ID
INNER JOIN POSITION2 as PO2 ON
    PO.ID = T.POSITION_ID
WHERE 1

This anomaly leads the sql_metadata library to generate an incorrect list of columns, as in Snowflake an entity not enclosed in "" is converted into the corresponded uppercased string.

formatted_query = sqlparse.format(query, **cfg)
parsed = Parser(formatted_query)
print(parsed.columns)

>> ['MYDB.SCHEMA2.PERSON.ID', 'MYDB.SCHEMA2.PERSON.EMAIL', 'MYDB.SCHEMA2.ticket.ID', 'MYDB.SCHEMA2.ticket.BLABLA', 'MYDB.SCHEMA2.ticket. BLABLA2 ', 'MYDB.SCHEMA2.ticket. *', 'Position.ROLE', 'POSITION2.BLABLA', 'MYDB.SCHEMA2.ticket.PERSON_ID', 'position.ID', 'MYDB.SCHEMA2.ticket.POSITION_ID']

In my case in fact I can't find any entity in my data catalog (managed externally to Snowflake) with the name position in lowercase, because it is saved as POSITION.

data_catalog_known_tables = ['PERSON', 'ticket', 'POSITION', 'POSITION2']
for column in parsed.columns:
  table_name = column.split ('.') [- 2]
  if not table_name in data_catalog_known_tables:
    raise Exception (f "Table \" {table_name} \ "not found")

bigluck avatar May 27 '22 08:05 bigluck

It seems that POSITION is an actual keyword. See here: https://github.com/andialbrecht/sqlparse/blob/master/sqlparse/keywords.py#L455 with my latest PR #693 you will be able to change the keywords to prevent this in your specific case. I don't know enough about sql syntax to know if this should be handled more generally.

mrmasterplan avatar Nov 30 '22 16:11 mrmasterplan