astro-sdk icon indicating copy to clipboard operation
astro-sdk copied to clipboard

Add dataframe as dataset for Open lineage

Open sunank200 opened this issue 2 years ago • 4 comments

Please describe the feature you'd like to see

  • Once the specs are added on Open lineage https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md for dataframe, pass the lineage values for input and output facets for dataframe on OL.

  • define a namespace/name for those datasets that AstroSDK temporarily stores in XCom. It needs to be unique and not reused. I’m not sure what’s the best, something like namespace=xcom://{airflow instance namespace} name=/{dag.task.runid ???} Optionally: Add a dataset facet that clarifies that this was a transient/temporary dataset that was deleted after it was read. (this is not in the spec yet but we should add it IMO)

{
  namespace: "xcom://{airflow instance namespace}",
  name: "/{dag.task.runid ???}",
  facets: {
    temporary: {
      ...
    }
  }
}

Describe the solution you'd like

  • Pass the lineage values for input and output facets for dataframe on OL.

Acceptance Criteria

  • [ ] Run example DAGs for the operator with dataframe and check it on Marquez and astro-cloud. Post the screenshot of the same.
  • [ ] All checks and tests in the CI should pass
  • [ ] Unit tests (90% code coverage or more, once available)
  • [ ] Integration tests (if the feature relates to a new database or external service)
  • [ ] Example DAG
  • [ ] Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • [ ] Exception handling in case of errors
  • [ ] Logging (are we exposing useful information to the user? e.g. source and destination)
  • [ ] Improve the documentation (README, Sphinx, and any other relevant)
  • [ ] How to use Guide for the feature (example)

sunank200 avatar Dec 20 '22 07:12 sunank200

@rajaths010494 to update this ticket with the latest discussion on OL

phanikumv avatar Jan 16 '23 12:01 phanikumv

https://astronomer.slack.com/archives/C03RTC9BQ4F/p1668446730868129

rajaths010494 avatar Jan 16 '23 12:01 rajaths010494

@rajaths010494 any update on this?

sunank200 avatar Mar 21 '23 13:03 sunank200

I had asked this in the channel no updates yet. https://astronomer.slack.com/archives/C03RTC9BQ4F/p1673870857024399?thread_ts=1668446730.868129&cid=C03RTC9BQ4F

rajaths010494 avatar Mar 27 '23 06:03 rajaths010494