simple-ddl-parser
simple-ddl-parser copied to clipboard
Incorrect JSON wrapper break JQ
Describe the bug
Commonly used Linux command line JQ tool doesn't work with JSON which prepared by sdp, because jq expect that wrapper is double quote. Example of real GCP BigQuery DDL:
DDL:
$ cat sample_ddl.sql
CREATE TABLE `dataset.table`
(
Updated_Date DATE NOT NULL OPTIONS (description =Description1),
Lead_Store_Id INT64 NOT NULL OPTIONS (description =Description2),
)
PARTITION BY LeadCreatedDate
CLUSTER BY Lead_Store_Id
OUTPUT:
$ sdp -o bigquery --no-dump sample_ddl.sql
Start parsing file sample_ddl.sql
File with result was saved to >> schemas folder
[{'alter': {},
'checks': [],
'cluster_by': ['Lead_Store_Id'],
'columns': [{'check': None,
'default': None,
'name': 'Updated_Date',
'nullable': False,
'options': [{'description': '=Description1'}],
'references': None,
'size': None,
'type': 'DATE',
'unique': False},
{'check': None,
'default': None,
'name': 'Lead_Store_Id',
'nullable': False,
'options': [{'description': '=Description2'}],
'references': None,
'size': None,
'type': 'INT64',
'unique': False}],
'constraints': {'checks': None, 'references': None, 'uniques': None},
'index': [],
'partition_by': {'columns': ['LeadCreatedDate'], 'type': None},
'partitioned_by': [],
'primary_key': [],
'schema': '`dataset',
'table_name': 'table`',
'tablespace': None}]
JQ run:
$ sdp -o bigquery --no-dump sample_ddl.sql | egrep -v 'Start parsing file |File with result was saved to ' | jq .
parse error: Invalid numeric literal at line 2, column 10
Followed JSON is OK:
[{"alter": {},
"checks": [],
"cluster_by": ["Lead_Store_Id"],
"columns": [{"check": "None",
"default": "None",
"name": "Updated_Date",
"nullable": "False",
"options": [{"description": "=Description1"}],
"references": "None",
"size": "None",
"type": "DATE",
"unique": "False"},
{"check": "None",
"default": "None",
"name": "Lead_Store_Id",
"nullable": "False",
"options": [{"description": "=Description2"}],
"references": "None",
"size": "None",
"type": "INT64",
"unique": "False"}],
"dataset": "`dataset",
"index": [],
"partition_by": {"columns": ["LeadCreatedDate"], "type": "None"},
"partitioned_by": [],
"primary_key": [],
"table_name": "table`",
"tablespace": "None"}]
Here is few issues:
-
JQ expect that all values wrapped (None, False, True - as well)
-
JQ expect that wrapper will be double quote, not single quote. JSON validator https://jsonformatter.curiousconcept.com/ said that points 1 and 2 are RFC violation.
-
Incorrect parsing if table name wrapped by back quote: "CREATE TABLE `dataset.table`". As you can see from output, parser keeped back quote at the beginning (but sometimes at the ending) of the table name:
'schema': '`dataset',
@andy65535 Hi! This is because in stdout output is not JSON, but python dict (and because of that you see None, in JSON it must be null). If you want JSON - remove flag --no-dump and it will dump json to the file. I didn't thought about use case that someone need json in stdout :) I will add in next release flag --json to get output in stdout as json, not Python object.
@xnuinside thank you for replay! I expected that stdout will be the same as output without --no-dump or with -t parameter. I use stdout as far I need specify another filename instead of default behavior (or -t what create folder for outputs). Anyway, I removed --no-dump and JSON now formatted as expected. But point 3 about back quotes from my initial report still in place:
CREATE TABLE `dataset.table`
parser leave back quotes and final JSON become:
...
"table_name": "table`",
... ^
"dataset": "`dataset"
... ^
@andy65535 hi, I'm ready to kill myself for the my previous answer to you! first of all ))) you can get json output with python code )) (I really need spend time on documentation improvement) - you can do it like this (sample added in README): https://github.com/xnuinside/simple-ddl-parser#get-output-in-json
second - to get output without artifacts like ` at the end you can use flag "normalize_names=True" like here: https://github.com/xnuinside/simple-ddl-parser/blob/07f8b1f5c4c6b4aaf15a0fce4ce29fc8b213757a/tests/non_statement_tests/test_common.py#L49
so if you will run parser like:
result = DDLParser(ddl, normalize_names=True).run(json_dump=True, group_by_type=True)
You will get required output.
by default (without flag normalize_names=True) parser leave as is all symbols to get possible reconstruct DDL from parser output
Again, one more time SORRY for my wrong answer )
With CLI is not possible to use normalize_names flag - I will work in next release to update the CLI to keep all arguments & settings for parser