[CT-2016] [Spike] Static artifact for CLI validation
The ask: A language-agnostic data structure to validate dbt CLI commands, without actually requiring dbt-core to be imported/installed. (Could be a JSONSchema, doesn't have to be.) Let's get away from any need for naïve regex.
For starters, the goal here wouldn't even be to parse CLI strings into meaningful representations — just to say, this is or isn't a valid CLI string. But I'd also imagine wanting to extend this nicer-to-have territory (auto-complete, blocking certain options, extending with additional options).
As I see it, two options:
- Abstract the combination of commands + params one step further than we already have, by means of Python methods & decorators, into a static data structure (e.g. JSON). Then, within
dbt-core's CLI, consume that data structure to generate the CLI methods/decorators. - Serialize
click.clito a static data structure, which could then be used (itself? byclick? by another tool?) just to validate CLI strings.
The closest thing I could find built into click is the to_info_dict method, which is really intended to support auto-generating documentation:
https://click.palletsprojects.com/en/8.1.x/api/#click.Command.to_info_dict
>>> from click import Context
>>> from dbt.cli.main import cli
>>> with Context(cli) as ctx:
>>> info = ctx.to_info_dict()
>>> info['command']['commands'].keys()
dict_keys(['build', 'clean', 'compile', 'debug', 'deps', 'docs', 'init', 'list', 'ls', 'parse', 'run', 'run-operation', 'seed', 'snapshot', 'source', 'test'])
>>> [param['name'] for param in info['command']['commands']['run']['params']]
['defer', 'favor_state', 'exclude', 'fail_fast', 'full_refresh', 'models', 'profile', 'profiles_dir', 'project_dir', 'select', 'selector', 'state', 'target', 'target_path', 'threads', 'vars', 'version_check', 'help']
FYI to Execution team: I'm going to queue this up for estimation discussion. Not expecting a point estimate (since it's a spike), just expecting that you all know more than I do about this topic & might have some strong opinions!
Here's a Python script that will output an artifact named dbt-core-cli-flags.json:
generate_cli_flags_artifact.py
import json
from click import Context
from dbt.cli.main import cli
def convert_to_serializable(obj):
# Convert non-serializable objects to strings
return str(obj)
def serialize_dict_to_json(input_dict):
# Use a custom conversion function when serializing
return json.dumps(input_dict, indent=4, default=convert_to_serializable)
with Context(cli) as ctx:
info = ctx.to_info_dict()
pretty_json_string = serialize_dict_to_json(info)
# Write the JSON string to a file
with open("dbt-core-cli-flags.json", "w") as file:
file.write(pretty_json_string)
Usage
python generate_cli_flags_artifact.py
Note: the dictionary can contain content that isn't serializable to JSON (like tuples, functions, etc), so this script just converts those to a string.
Related internal Slack threads
(Added 2024-01-12)
- https://dbt-labs.slack.com/archives/C0131TY7EEA/p1687469142661609
- https://dbt-labs.slack.com/archives/C051TUB7S9W/p1689960917438369
Here's a section of the documentation that outlines which sub-commands are available in dbt Core, dbt Cloud CLI, and dbt Cloud IDE: