astro-sdk
astro-sdk copied to clipboard
Add dataframe as dataset for Open lineage
Please describe the feature you'd like to see
-
Once the specs are added on Open lineage https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md for dataframe, pass the lineage values for input and output facets for dataframe on OL.
-
define a namespace/name for those datasets that AstroSDK temporarily stores in XCom. It needs to be unique and not reused. I’m not sure what’s the best, something like namespace=xcom://{airflow instance namespace} name=/{dag.task.runid ???} Optionally: Add a dataset facet that clarifies that this was a transient/temporary dataset that was deleted after it was read. (this is not in the spec yet but we should add it IMO)
{
namespace: "xcom://{airflow instance namespace}",
name: "/{dag.task.runid ???}",
facets: {
temporary: {
...
}
}
}
Describe the solution you'd like
- Pass the lineage values for input and output facets for dataframe on OL.
Acceptance Criteria
- [ ] Run example DAGs for the operator with dataframe and check it on Marquez and astro-cloud. Post the screenshot of the same.
- [ ] All checks and tests in the CI should pass
- [ ] Unit tests (90% code coverage or more, once available)
- [ ] Integration tests (if the feature relates to a new database or external service)
- [ ] Example DAG
- [ ] Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
- [ ] Exception handling in case of errors
- [ ] Logging (are we exposing useful information to the user? e.g. source and destination)
- [ ] Improve the documentation (README, Sphinx, and any other relevant)
- [ ] How to use Guide for the feature (example)
@rajaths010494 to update this ticket with the latest discussion on OL
https://astronomer.slack.com/archives/C03RTC9BQ4F/p1668446730868129
@rajaths010494 any update on this?
I had asked this in the channel no updates yet. https://astronomer.slack.com/archives/C03RTC9BQ4F/p1673870857024399?thread_ts=1668446730.868129&cid=C03RTC9BQ4F