sqlfluff icon indicating copy to clipboard operation
sqlfluff copied to clipboard

Invalid Keyword Usage in `PostgreSQL` Dialect

Open R3gardless opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

What Happened

According to the postgresql keywords appendix, some keywords can't be used as function names or types.

However, when running the following command, no error was raised:

test.sql

CREATE TABLE test_table (
    type between NOT NULL
);

command

$ sqlfluff parse test.sql --dialect postgres

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    statement:
[L:  1, P:  1]      |        create_table_statement:
[L:  1, P:  1]      |            keyword:                                          'CREATE'
[L:  1, P:  7]      |            whitespace:                                       ' '
[L:  1, P:  8]      |            keyword:                                          'TABLE'
[L:  1, P: 13]      |            whitespace:                                       ' '
[L:  1, P: 14]      |            table_reference:
[L:  1, P: 14]      |                naked_identifier:                             'test_table'
[L:  1, P: 24]      |            whitespace:                                       ' '
[L:  1, P: 25]      |            bracketed:
[L:  1, P: 25]      |                start_bracket:                                '('
[L:  1, P: 26]      |                [META] indent:
[L:  1, P: 26]      |                newline:                                      '\n'
[L:  2, P:  1]      |                whitespace:                                   '    '
[L:  2, P:  5]      |                column_reference:
[L:  2, P:  5]      |                    naked_identifier:                         'type'
[L:  2, P:  9]      |                whitespace:                                   ' '
[L:  2, P: 10]      |                data_type:
[L:  2, P: 10]      |                    data_type_identifier:                     'between'
[L:  2, P: 17]      |                whitespace:                                   ' '
[L:  2, P: 18]      |                column_constraint_segment:
[L:  2, P: 18]      |                    keyword:                                  'NOT'
[L:  2, P: 21]      |                    whitespace:                               ' '
[L:  2, P: 22]      |                    keyword:                                  'NULL'
[L:  2, P: 26]      |                newline:                                      '\n'
[L:  3, P:  1]      |                [META] dedent:
[L:  3, P:  1]      |                end_bracket:                                  ')'
[L:  3, P:  2]      |    statement_terminator:                                     ';'
[L:  3, P:  3]      |    [META] end_of_file:

Expected Behaviour

An unparsable error should be raised

Observed Behaviour

No error was raised

How to reproduce

test.sql

CREATE TABLE test_table (
    type between NOT NULL
);

command

$ sqlfluff parse test.sql --dialect postgres

Dialect

postgres

Version

3.2.5

Configuration

None

Are you willing to work on and submit a PR to address the issue?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

R3gardless avatar Nov 06 '24 09:11 R3gardless

@alanmcruickshank I'm working on dialect_postgres_keywords and noticed that its structure is a bit different from other dialects. This difference may be due to the PostgreSQL extension keywords.

In addition, I'm considering if not-keyword terms are necessary in postgres_keywords, since they may be keywords in ANSI but not in PostgreSQL. I think it might be better to define postgres_reserved_keywords and postgres_non_reserved_keywords directly, rather than using tuples, which are only used in postgres_keywords and not in other dialects.

For extension keywords, I thought of adding separate lists like pgvector_keywords or postgis_keywords, as these are all non-reserved.

Could you provide any advice on this approach?

R3gardless avatar Nov 07 '24 02:11 R3gardless

Hey @R3gardless . I wrote that keyword set initially. The idea was that I could track what was in the docs, and override what the docs said based on what was actually true in the language. In retrospect, it wasn't a great idea, and it just causes the keywords to look different in postgres to every other dialect. Feel free to rewrite it if you want to 👍

WittierDinosaur avatar Nov 30 '24 21:11 WittierDinosaur

Thank you for the advice, @WittierDinosaur!

I referred to the official PostgreSQL source code (gram.y) and noticed that keywords are divided into five categories: unreserved, col_name, type_func_name, reserved, and bare_label.

I’m considering following the official parser rules to align with the source code. This approach could necessitate some modifications to the overall postgres_dialect implementation.

I’d appreciate your thoughts on this idea.

Thanks again!

R3gardless avatar Dec 01 '24 14:12 R3gardless