[FORMATTING] Space between type and options removed in nested BigQuery fields
When formatting BigQuery DDL statement with OPTIONS on a Struct field, any space between the type definition and the Options definition is removed, resulting in an invalid statement.
Input data
CREATE TABLE `project.dataset.format_test`
(
str STRING OPTIONS(description="A string field."),
int INT64 OPTIONS(description="An int64 field."),
record STRUCT<nested INT64 OPTIONS(description="a struct field")> OPTIONS(description="a struct")
)
Expected Output
CREATE TABLE
`project.dataset.format_test` (
str STRING OPTIONS(description = "A string field."),
int INT64 OPTIONS(description = "An int64 field."),
record STRUCT<nested INT64 OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
)
Actual Output
CREATE TABLE
`project.dataset.format_test` (
str STRING OPTIONS(description = "A string field."),
int INT64 OPTIONS(description = "An int64 field."),
record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
)
Usage
To run this I've stored the input in a file called example-sql.sql and am running:
> sql-formatter --v
12.2.3
> sql-formatter -l bigquery example-sql.sql
CREATE TABLE
`project.dataset.format_test` (
str STRING OPTIONS(description = "A string field."),
int INT64 OPTIONS(description = "An int64 field."),
record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
)
Thanks for reporting. Definitely a bug.
I remember that the part that deals with this type-parameter formatting was pretty ugly and error-prone. Perhaps there's a simple fix I can make, perhaps not. More likely not. Will look into this.
In the mean time, perhaps you'd like to try out a new SQL formatting tool that I've written, which specifically includes full BigQuery support: https://github.com/nene/prettier-plugin-sql-cst
It formats the SQL somewhat differently, but at least it wont output invalid code. For your SQL it produces the following:
CREATE TABLE `project.dataset.format_test` (
str STRING OPTIONS (description = "A string field."),
int INT64 OPTIONS (description = "An int64 field."),
record STRUCT<nested INT64 OPTIONS (description = "a struct field")> OPTIONS (
description = "a struct"
)
);
I'm not really sure whether it would be better to format options as OPTIONS(...) or OPTIONS (...). Or which style is more common in BigQuery community. Open to suggestions.