dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-968] [Feature] Disallow “view” materialization if Python

Open ChenyuLInx opened this issue 3 years ago • 2 comments
trafficstars

Is this your first time opening an issue?

Describe the Feature

Current dbt python model doesn't support view materialization. But instead of raise a clear error, it will try to run python code with sql together.

Describe alternatives you've considered

Add check for language in materialization code- meaning we will have to update a lot of them

Who will this benefit?

python model user

Are you interested in contributing this feature?

Yes

Anything else?

No response

ChenyuLInx avatar Jul 28 '22 01:07 ChenyuLInx

Copying idea shared in Slack thread

  • Each materialization macro records metadata about which languages it supports, as a property of the macro node
  • We validate here at runtime that the node's language is in the materialization macro's supported languages
  • By default, all materialization macros support SQL — and then we "opt in" table/incremental on Snowflake + BigQuery + Spark/Databricks to support Python too

This is a pattern I could see wanting to extend for other things that are implicit in their macro code today. E.g.

does this materialization on this adapter respect --full-refresh? does this materialization on this adapter "carry over" grants? does this materialization on this adapter create a database object that's "auto-updating" for new data (views, materialized views), or needs manual updating by dbt (table, incremental)?

We have syntax today to "tag" (?) materializations with their intended adapter support:

{% materialization table, default %}
...
{% materialization incremental, adapter="snowflake" %}

Maybe there's something we could use/extend here?

{% materialization incremental, adapter="snowflake", languages=["sql", "python"] %}

Behind the scenes, dbt uses those not-quite-arguments to construct the actual macro name (e.g. materialization_incremental_snowflake). I don't think we want to do that here! — but maybe pull it off and store it as a separate property of the node?

https://github.com/dbt-labs/dbt-core/blob/1071a4681df91633301fdf23e34de819b66fbeb7/core/dbt/clients/jinja.py#L345-L374

jtcohen6 avatar Jul 28 '22 10:07 jtcohen6

The place in the code (runtime) where the validation error should raise:

https://github.com/dbt-labs/dbt-core/blob/40b55ed65a86cee95102e8342100547de7ba4a63/core/dbt/task/run.py#L256-L266

why do this?

If we don't do this, dbt does this:

create view ... as (
    ... python code ...
)

Which is a very confusing message for end users :)

jtcohen6 avatar Aug 03 '22 16:08 jtcohen6

Updates that came from some pairing time with @stu-k today:

If we write our materialization macro as something like:

{% materialization seed, adapter='snowflake', language=['sql'] %}
actual logic
{% endmaterialization %}

we can just parse out the args and also add a default value for things that doesn't have language specified by updating part of the parsing code from https://github.com/dbt-labs/dbt-core/blob/1071a4681df91633301fdf23e34de819b66fbeb7/core/dbt/clients/jinja.py#L345-L374

into

    def parse(self, parser):
        node = jinja2.nodes.Macro(lineno=next(parser.stream).lineno)
        materialization_name = parser.parse_assign_target(name_only=True).name

        adapter_name = "default"
        node.args = []
        node.defaults = []

        while parser.stream.skip_if("comma"):
            target = parser.parse_assign_target(name_only=True)

            if target.name == "default":
                pass

            elif target.name == "adapter":
                parser.stream.expect("assign")
                value = parser.parse_expression()
                adapter_name = value.value
            elif target.name == "language":
                parser.stream.expect("assign")
                value = parser.parse_expression()
                target.set_ctx("param")
                node.args.append(target)
                node.defaults.append(value)
            else:
                invalid_materialization_argument(materialization_name, target.name)
        if not node.args:
            node.args = [jinja2.nodes.Name("language", "store")]
            node.defaults = [jinja2.nodes.List([jinja2.nodes.Const('sql')])]
        node.name = get_materialization_macro_name(materialization_name, adapter_name)

        node.body = parser.parse_statements(("name:endmaterialization",), drop_needle=True)

        return node

Then node.args will contain argument 'language', node.defaults will contain the parse values.

Note that this part of the parsing actually doesn't look into any of the body of materialization macro. It just look up the function definition and originally only used to get the name.

The code that calls the parsing logic above is the following https://github.com/dbt-labs/dbt-core/blob/a7ff003d4f74763f126e77a0fc9c55e7a3311b3d/core/dbt/parser/macros.py#L69-L89 You can see macro_name is being passed to create a ParsedMacro.

We can add an optional attribute in ParsedMacro to store the supported language, then check that and model's language at where @jtcohen6 linked above.

@gshank any suggestions on better ways of doing this is welcomed. I feel like we are abusing those attributes a bit but not too sure what's the better way to do it

ChenyuLInx avatar Aug 12 '22 21:08 ChenyuLInx