astronomer-cosmos
astronomer-cosmos copied to clipboard
Add default behaviour for converting dbt Source nodes into Airflow tasks
As of Cosmos 1.2.1, it does not render dbt Source nodes as Airflow tasks by default.
Since Cosmos 1.2.0, we've introduced support for customizing how the library converts any dbt node into Airflow: https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#customizing-how-nodes-are-rendered-experimental
This means users can already customise the desired behaviour for Source nodes, using something like: https://github.com/astronomer/astronomer-cosmos/blob/11ce2d718483f1d4eb1bfdac659417fb66a1492e/dev/dags/example_cosmos_sources.py#L63
The open questions are:
- Do we want Cosmos to have a default behaviour for dbt Source nodes and automatically generate tasks for nodes of this type?
- If the answer to (1) is yes, what do we want the behaviour for those nodes to be? Should we run:
dbt source freshness
In addition to freshness checks users can also (and sometimes do) add standard tests to their sources, i.e. unique
, not_null
, etc. I think a good default behaviour would be to display sources as a task in the Airflow UI as this provides an Airflow DAG to the end user familiar with dbt that is most similar to dbt's DAG, thereby enabling Cosmos adoption. These tasks can use EmptyOperator and therefore have no impact on compute usage or scheduler resources.
One small note, DummyOperator has been deprecated in favour of EmptyOperator in Airflow 2.4 (see here).
Hi, @tatiana,
I'm Dosu, and I'm helping the Cosmos team manage their backlog. I wanted to let you know that I'll be marking this issue as stale. From what I understand, the issue you raised pertains to the default rendering of dbt Source nodes as Airflow tasks in Cosmos 1.2.1, and there's a discussion about the need for default behavior considering the customization options introduced in Cosmos 1.2.0. Additionally, there's a suggestion to display sources as tasks in the Airflow UI and a mention of the deprecation of DummyOperator in favor of EmptyOperator in Airflow 2.4.
Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding, and please don't hesitate to reach out if you have any questions or need further assistance.
Best, Dosu
+1 to this issue as this is something I'm interested in.
- Do we want Cosmos to have a default behaviour for dbt Source nodes and automatically generate tasks for nodes of this type?
I'm in support for having each source render as an Airflow task, and running all tests attached to it. This improves the visibility of a source and whether a model update failure is due to an upstream failure.
- If the answer to (1) is yes, what do we want the behaviour for those nodes to be?
dbt test --select source:jaffle_shop.orders
This command should run all the tests associated to a source. I'm not sure how we should handle dbt source freshness
though.
Add on questions: 3. How do we handle models that share the same source? Should we render 2 source nodes or should the models share the same source node?
@tatiana, could you please help with this issue? The user has indicated that the issue is still relevant and has provided additional context and questions. Thank you!
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Look like we have a PR https://github.com/astronomer/astronomer-cosmos/pull/661 from @arojasb3 to fix it