Snowflake column name case-sensitive
I am trying to specify constraints for a Snowflake table with a schema that includes a column named "ALTER". After adding an add_uniques_equality_constraint constraint, testing the code results in a KeyError: 'alter' error.
Steps to Reproduce
- Create a Snowflake table with the following schema:
create or replace TABLE EXAMPLE (
JAHR FLOAT,
"ALTER" FLOAT
);
- Add a
add_uniques_equality_constraintconstraint:
requirement.add_uniques_equality_constraint(
["ALTER"], list(range(0, 100 + 1))
)
- Test the constraints
Expected Behavior
The code should run without errors.
Actual Behavior
Running the code results in a KeyError: 'alter' error.
Root Cause
The issue is caused by the Snowflake capitalization fix (https://github.com/Quantco/datajudge/blob/main/src/datajudge/db_access.py#L346). This fix forces the column name to be lowercase, which interferes with the case-sensitive "ALTER" column name in the table schema.
Hi @0xbe7a - thanks for your careful write-up.
Do I guess correctly that this problem is unlikely to be linked to the fact that ALTER is a SQL keyword per se, but rather due to the fact that it is wrapped with quotes? Put differently, does the problem occur, too, if you wrap, e.g. JAHR with quotes?
create or replace TABLE EXAMPLE (
"JAHR" FLOAT,
age FLOAT
);
Yes, you are correct, this issue stems from the fact that "ALTER" is quoted. The quotation is required because ALTER is a SQL keyword, but "JAHR" will result in the same behaviour. This quotation is problematic here because datajudge converts all column names to lowercase. This breaks here since Double-quoted Identifiers are case-sensitive on Snowflake (https://docs.snowflake.com/en/sql-reference/identifiers-syntax). This results in the KeyError since the casing is messed up
I see. For context, we came up with this work-around because the out-of-the box behaviour of snowflake-sqlalchemy was not what we expected it to be, see here: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157
Since we are not the only people who are unhappy with the current default behaviour, others seemed to have written their internal work-arounds, too. In particular ibis seems to have implemented a client-side fix taking quoted identifiers into account:
https://github.com/ibis-project/ibis/pull/5741
Unfortunately I haven't had the chance to look into the latter in greater detail. If this is something that you find relevant interesting, it'd certainly be useful if you could have a look at it in order to assess whether we could follow a similar approach with datajudge. No pressure, though.