Michael Collado
Michael Collado
The `runs` table and `lineage_events` table get pretty large. Oftentimes, we only want to search for runs within a given time frame (e.g., the last week). We should consider partitioning...
In the `OpenLineageService`, we construct a `DatasetVersionId` using the `uuid` property of the record - that is, the primary key. However, in the `RunDao`, when we construct the `DatasetVersionId` of...
The change in https://github.com/MarquezProject/marquez/pull/1593 made the `marquez-api` jar incompatible with code that had depended on the `LineageEvent` class and its related classes. Any code that depended on those models must...
Some time ago, the OpenLineage spec was changed to include [`outputFacets`](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L160) and [`inputFacets`](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L134) in the `Dataset`s reported by a `RunEvent`. Several of the integrations, including the Spark integration and the...
OpenLineage is going to support adding dataset and (possibly) job metadata outside of the context of a run (e.g., team ownership of a dataset, etc.). Marquez will need to be...
The Marquez client model is pretty out of sync from the server-side model. Ideally, both the client and server-side models should be generated from the open api spec and always...
### Problem Prior to https://github.com/OpenLineage/OpenLineage/pull/1037 , all Airflow OpenLineage events reported their parent runs as `parentRun` rather than the spec-defined `parent` facet. This change adds an alias to support those...