astro-sdk
astro-sdk copied to clipboard
Databricks workflow support for `aql.dataframe functions`
Please describe the feature you'd like to see
Astro SDK users should be able to run their aql.dataframe functions in a databricks workflow task group.
@aql.dataframe()
def foo(...)
....
@aql.transform()
def bar(...)
.....
with dag:
with DatabricksWorkflowTaskGroup() as tg:
f = foo()
b = bar()
Adding this function should mean that the python script submitted to databricks is run in a databricks workflow, giving them access to databricks Job Clusters.
Describe the solution you'd like
The first few steps can be the same as https://github.com/astronomer/astro-sdk/issues/1822, where we generate a python file and load it to DBFS, the only major difference will be that instead of launching the task, we add it to the databricks by adding a convert_to_databricks_workflow_task function, as well as the necessary functions to monitor the task remotely similar to how we handle in cosmos (we can potentially even create a shared base class for these functions).
Are there any alternatives to this feature?
The alternative is to put python code in a databrikcs notebook and use the cosmos DatabricksNotebookOperator
Additional context Add any other context about the feature request here.
Acceptance Criteria
- [ ] All checks and tests in the CI should pass
- [ ] Unit tests (90% code coverage or more, once available)
- [ ] Integration tests (if the feature relates to a new database or external service)
- [ ] Example DAG
- [ ] Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
- [ ] Exception handling in case of errors
- [ ] Logging (are we exposing useful information to the user? e.g. source and destination)
- [ ] Improve the documentation (README, Sphinx, and any other relevant)
- [ ] How to use Guide for the feature (example)