sql-metadata icon indicating copy to clipboard operation
sql-metadata copied to clipboard

Support Postgres quoted identifiers

Open LanDinh opened this issue 4 years ago • 5 comments

Within Postgres, it's possible (and sometimes necessary) to put double quotes around table & column names. This causes undesired behavior within sql_metadata:

  • get_query_columns won't find any columns, even if they exist.
  • get_query_tables identifies the table names (though it doesn't recognize the quotes as being part of the name and will only return the unquoted part).
  • generalize_sql will generify both table and column names, which defeats the point of the call.

I'm able to work around the table issue by using the result of get_query_table to remove the quotes around the table names from the query, so that generalize_sql won't eat my table names - but this trick doesn't work for the columns, as get_query_columns doesn't recognize them.

Reasons why this should be implemented:

  • Legacy projects that can't just rewrite their database scheme
  • Django projects. <-- I'm part of this group. Even though Django generates valid Postgres table & column names (e. g. lowercase names without spaces), it will still quote them when constructing the queries because it's possible to manually specify names - and by using quotes, the chosen names won't cause errors. And the quotes are valid syntax for Postgres (see https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS for reference, the part about quoted identifiers), so I believe that this is a very valid user group, especially with how widely spread Django is - and since sql_metadata claims to support Postgres SQL, even more so!

LanDinh avatar Jan 17 '21 15:01 LanDinh

Thanks for reporting a bug. Agree with you, this should be handled properly.

@LanDinh - can you provide an example Postgres query that fails to be parsed correctly by sql-metadata?

macbre avatar Jan 17 '21 18:01 macbre

Of course! Here are a couple of examples for PostgreSQL as produced by Django:

  1. INSERT INTO "test" ("name") VALUES (\'foo\') RETURNING "test"."id"
  2. SELECT "test"."id", "test"."name" FROM "test" WHERE "test"."name" = \'foo\' LIMIT 21 FOR UPDATE
  3. UPDATE "test" SET "name" = \'bar\' WHERE "test"."id" = 1

Results for 1:

  • get_query_tables: ['test']
  • get_query_columns: []
  • generalize_sql: 'INSERT INTO X (X) VALUES (X) RETURNING X.X'

Results for 2:

  • get_query_tables: ['test']
  • get_query_columns: []
  • generalize_sql: 'SELECT X.X, X.X FROM X WHERE X.X = X LIMIT N FOR UPDATE'

Results for 3:

  • get_query_tables: ['test']
  • get_query_columns: []
  • generalize_sql: 'UPDATE X SET X = X WHERE X.X = N'

LanDinh avatar Jan 17 '21 21:01 LanDinh

Unfortunately this is not working again. Example query SELECT "qouted" FROM foo raises ValueError: Not supported query type!

pro100filipp avatar Jan 26 '23 06:01 pro100filipp

Thanks, @pro100filipp. Reopened the ticket. Do you have more examples of queries failing to be parsed?

macbre avatar Jan 26 '23 12:01 macbre

As I see it's any query with quotes in it – even if it's a part of string literal e.g. select 'some string with quote " char'

pro100filipp avatar Jan 26 '23 15:01 pro100filipp