sql-formatter icon indicating copy to clipboard operation
sql-formatter copied to clipboard

Add support for Snowflake Language

Open primeapple opened this issue 3 years ago • 3 comments

This is first draft to solve #366 . It does not work yet, mainly because I had some troubles with the QuoteTypes.

Review

Please review, have a look at the following things:

  1. TODO comments show lines where I'm totally lost
  2. I'm very unsure about the reservedComments, the functions and the keywords.
  3. Did I put the correct things in there, or do they belong somewhere else?
  4. Please check the places where I used the [thing] {a|b|c} ... syntax, if I used them correctly.
  5. Did I miss any tests?
  6. Any other places that have to be changed?

Questions

I want to support accessing elements in unstructured data. This can be done via square brackets [,] or via colons : , like here:

select src:salesperson.name from car_sales

select src['salesperson']['name'] from car_sales

How should I do this? I added : as an operator, is this right? How about the brackets?

Thanks for this Repo!

primeapple avatar Sep 22 '22 06:09 primeapple

Thanks for the effort.

Regarding your main question about colon and square-bracket syntax.

  • For brackets, adding extraParens: ['[]'] should do the trick. See for example in n1ql.formatter.ts
  • With colon, things are a bit trickier. You should indeed add it as an operator. However, currently the default is to format colon with a space after (as in name: value). To fix this, a language-specific condition should be added to ExpressionFormatter.formatOperator() - there already is a similar thing for @ operator in PL/SQL.

In general

It would be really helpful if you could also update our wiki with information about Snowflake SQL syntax. Otherwise I'll have to do all this research on my own before I can merge this PR.

nene avatar Sep 22 '22 09:09 nene

I will do all the stuff you mentioned. Please go other the TODO comments in the changes. I can't run tests because I don't understand the Quote types yet...

primeapple avatar Sep 22 '22 09:09 primeapple

Regarding quote types, see: https://github.com/sql-formatter-org/sql-formatter/blob/master/src/lexer/regexFactory.ts#L99

The -qq and -bs suffixes at the end of quote types define what sort of escaping (if any) is used inside the quotes.

nene avatar Sep 22 '22 10:09 nene

Some comments about wiki updates:

  • Parameters page says that Snowflake supports $1 parameters and links to select docs. I'm pretty sure these are something else than parameters one would use in prepared statements. I think these things are more similar to variables - they also share a very similar syntax. It's probably better to reference this syntax from Variables page. Probably the simplest path for supporting this syntax is also to treat them the same as variables - at lest that would be my first gut-instinct-guess.

  • Comments page mentions support for # comments but says these aren't actually supported. Syntax highlighting really shouldn't be used as a guide for what is and isn't supported. So probably best to be removed. Too bad you haven't found any official documentation for the comment syntax. Found one Knowledge Base page that at least mentions the // syntax - we could link to that.

nene avatar Sep 27 '22 09:09 nene

I updated all the wiki pages for snowflake (also updated your suggestions). One problem is the Window-Clause. Snowflake seems to not have support for a WINDOW clause in it's Query Syntax. There is QUALIFY but I'm unsure if this is the same?
However, there is support for Window Functions but also here I'm unsure if it is the same as in the other languages.
Could you have a look into this?

I will rewriting the code now.

primeapple avatar Sep 27 '22 11:09 primeapple

Thanks. Updated the docs for Snowflake window functions syntax.

nene avatar Sep 27 '22 12:09 nene

BTW. I would appreciate if you didn't do further force-pushes to this branch. They make it hard to tell what has changed.

nene avatar Sep 27 '22 12:09 nene

First refactor done. Please the snowflake.test shows an infinity loop. Could you have a look into it? It happens even with a single test case.

primeapple avatar Sep 28 '22 06:09 primeapple

Regarding the formatting of : operator. I made a change to make the customization of formatting simpler: #466

One should now be able to simply add this to SnowflakeFormatter class:

formatOptions() {
  return { alwaysDenseOperators: [':'] };
}

nene avatar Sep 28 '22 08:09 nene

Also added escaping of keywords, so the $ character in function names should no more pose a problem: #473

nene avatar Sep 28 '22 08:09 nene

I'm sorry, I screwed up the commits by including your commits into this PR. This is why I usually force push, it allows me to cleanly integrate commits to master with rebase to the feature branch and then force push the feature branch. Merging would've been a better idea.

I will clean up the tests now.

primeapple avatar Sep 28 '22 09:09 primeapple

Ok, I reverted your commits in this PR and instead merged master into this branch. Everything fine now. Puhhh...

The only new important commit for you is 8dc6f51

primeapple avatar Sep 28 '22 09:09 primeapple

If you want, feel free to delete these accidental commits and force-push the result. You'll just need to:

git rebase -i 152bd48
# delete all the commits besides the 8dc6f51 and the last merge
# or also delete the merge and just redo it.

nene avatar Sep 28 '22 10:09 nene

  1. Is there an option to define how identifiers are allowed to look like? In Snowflake they can include any symbol when inside "", but without quoting they have a certain rule. There is the identChars Config option, but I'm unsure if this refers to quoted or unquoted identifiers.

  2. Is there an option to test variables?

  3. Did I miss any obvious support... tests?

primeapple avatar Sep 28 '22 12:09 primeapple

Is there an option to define how identifiers are allowed to look like?

Yep, there are two options for that:

  • identTypes describes quoted identifiers (with all characters allowed inside quotes).
  • identChars describes unquoted identifiers.

Both of these seem to be configured correctly.

Is there an option to test variables?

No, the variable syntax differs quite a bit between SQL dialects. Have a look at the variable tests in behavesLikeMariaDbFormatter.ts and create some custom tests for Snowflake.

Did I miss any obvious support... tests?

Doesn't seem so at first glance.

nene avatar Sep 28 '22 14:09 nene

I understand your concerns regarding the tests. However in my opinion they have the purpose of explaining the accepted syntax without having to look into the actual source files.

For example with supportsTruncateTable(format, { withoutTable: true, ifExists: true }) we can very easily see all the dialects that support TRUNCATE IF EXISTS, without having to manually look into the source files?

We've 3 options:

  1. Leave the tests in this PR, add the other languages that support them later (I can do this if you want?)
  2. Add the tests completly in a later PR
  3. Remove the tests

Option 3. would decrease test coverage and verbosity for the sake of a few less lines and some milisecons of test time.

You are the maintainer, it's up to you.

primeapple avatar Sep 29 '22 08:09 primeapple

OK, let the tests be for now as they are. My main concern was with the SET SCHEMA test. Others are not that big of a deal.

nene avatar Sep 29 '22 08:09 nene

OK. I fixed the : handling. If you now merge in the master, it should work.

Though now you'll also need to add the :: operator to the alwaysDenseOperators config.

nene avatar Sep 29 '22 09:09 nene

Everything done, all tests pass. Is this mergable now? :smiley:

primeapple avatar Sep 29 '22 09:09 primeapple

I somehow can't answer your comments, so I'll do it here:

I added to reservedPhrases:

  'MATCH {FULL | SIMPLE | PARTIAL}',
  'ON {UPDATE | DELETE} {CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION}',
  'INITIALLY {DEFERRED | IMMEDIATE}',

This breaks the supportConstraints tests, because we are doing ON UPDATE CURRENT_TIMESTAMP there. This test is problematic anyway because we are using it for many languages that don`t even support this line (Postgresql for example).

I fixed the test, we can add the correct specifications for other languages later.

primeapple avatar Sep 29 '22 10:09 primeapple

Thanks for putting up with all my complaints :)

I'll merge this in and create a beta release out of it.

nene avatar Sep 29 '22 10:09 nene

Thank you for all the fast feedback. It was my first open source contribution and you made the process very easy.

primeapple avatar Sep 29 '22 10:09 primeapple

It's now released as 11.0.0-beta.1

nene avatar Sep 29 '22 10:09 nene