dbt-core
dbt-core copied to clipboard
[CT-968] [Feature] Disallow “view” materialization if Python
Is this your first time opening an issue?
- [X] I have read the expectations for open source contributors
Describe the Feature
Current dbt python model doesn't support view materialization. But instead of raise a clear error, it will try to run python code with sql together.
Describe alternatives you've considered
Add check for language in materialization code- meaning we will have to update a lot of them
Who will this benefit?
python model user
Are you interested in contributing this feature?
Yes
Anything else?
No response
Copying idea shared in Slack thread
- Each materialization macro records metadata about which languages it supports, as a property of the macro node
- We validate here at runtime that the node's language is in the materialization macro's supported languages
- By default, all materialization macros support SQL — and then we "opt in" table/incremental on Snowflake + BigQuery + Spark/Databricks to support Python too
This is a pattern I could see wanting to extend for other things that are implicit in their macro code today. E.g.
does this materialization on this adapter respect
--full-refresh? does this materialization on this adapter "carry over" grants? does this materialization on this adapter create a database object that's "auto-updating" for new data (views, materialized views), or needs manual updating by dbt (table, incremental)?
We have syntax today to "tag" (?) materializations with their intended adapter support:
{% materialization table, default %}
...
{% materialization incremental, adapter="snowflake" %}
Maybe there's something we could use/extend here?
{% materialization incremental, adapter="snowflake", languages=["sql", "python"] %}
Behind the scenes, dbt uses those not-quite-arguments to construct the actual macro name (e.g. materialization_incremental_snowflake). I don't think we want to do that here! — but maybe pull it off and store it as a separate property of the node?
https://github.com/dbt-labs/dbt-core/blob/1071a4681df91633301fdf23e34de819b66fbeb7/core/dbt/clients/jinja.py#L345-L374
The place in the code (runtime) where the validation error should raise:
https://github.com/dbt-labs/dbt-core/blob/40b55ed65a86cee95102e8342100547de7ba4a63/core/dbt/task/run.py#L256-L266
why do this?
If we don't do this, dbt does this:
create view ... as (
... python code ...
)
Which is a very confusing message for end users :)
Updates that came from some pairing time with @stu-k today:
If we write our materialization macro as something like:
{% materialization seed, adapter='snowflake', language=['sql'] %}
actual logic
{% endmaterialization %}
we can just parse out the args and also add a default value for things that doesn't have language specified by updating part of the parsing code from https://github.com/dbt-labs/dbt-core/blob/1071a4681df91633301fdf23e34de819b66fbeb7/core/dbt/clients/jinja.py#L345-L374
into
def parse(self, parser):
node = jinja2.nodes.Macro(lineno=next(parser.stream).lineno)
materialization_name = parser.parse_assign_target(name_only=True).name
adapter_name = "default"
node.args = []
node.defaults = []
while parser.stream.skip_if("comma"):
target = parser.parse_assign_target(name_only=True)
if target.name == "default":
pass
elif target.name == "adapter":
parser.stream.expect("assign")
value = parser.parse_expression()
adapter_name = value.value
elif target.name == "language":
parser.stream.expect("assign")
value = parser.parse_expression()
target.set_ctx("param")
node.args.append(target)
node.defaults.append(value)
else:
invalid_materialization_argument(materialization_name, target.name)
if not node.args:
node.args = [jinja2.nodes.Name("language", "store")]
node.defaults = [jinja2.nodes.List([jinja2.nodes.Const('sql')])]
node.name = get_materialization_macro_name(materialization_name, adapter_name)
node.body = parser.parse_statements(("name:endmaterialization",), drop_needle=True)
return node
Then node.args will contain argument 'language', node.defaults will contain the parse values.
Note that this part of the parsing actually doesn't look into any of the body of materialization macro. It just look up the function definition and originally only used to get the name.
The code that calls the parsing logic above is the following https://github.com/dbt-labs/dbt-core/blob/a7ff003d4f74763f126e77a0fc9c55e7a3311b3d/core/dbt/parser/macros.py#L69-L89 You can see macro_name is being passed to create a ParsedMacro.
We can add an optional attribute in ParsedMacro to store the supported language, then check that and model's language at where @jtcohen6 linked above.
@gshank any suggestions on better ways of doing this is welcomed. I feel like we are abusing those attributes a bit but not too sure what's the better way to do it