Openlineage Adapter
Issue by skrawcz
Tuesday Jun 21, 2022 at 21:57 GMT
Originally opened as https://github.com/stitchfix/hamilton/issues/137
Is your feature request related to a problem? Please describe. Hamilton encodes a lot of metadata that lives in code. It also creates some at execution time. There are projects such as https://datahubproject.io/, https://openlineage.io/ that capture this metadata across a wide array of tooling to create a central view in a heterogenous environment. Hamilton should be able to emit metadata/executions information to them.
Describe the solution you'd like A user should be able to specify whether their Hamilton DAG should emit metadata. This should play nicely with graph adapters, e.g. spark, ray, dask.
This should use the post graph execution hook to emit open lineage information.
use case:
- creating a data set and writing it somewhere. The adapter can then emit openlineage information about what was executed.
Comment by skrawcz
Wednesday Jun 22, 2022 at 03:53 GMT
~Adding a custom source for Datahub:~
- https://datahubproject.io/docs/metadata-ingestion/developing/
- https://datahubproject.io/docs/metadata-ingestion/adding-source/
Comment by skrawcz
Tuesday Dec 27, 2022 at 21:16 GMT
FYI - @gravesee - seems like I created this a while back for emission of metadata/usage.
Would love any help in providing a motivating use case!
Updated this issuse to be around openlineage.
This is done