Michael Collado

Results 7 issues of Michael Collado

The `runs` table and `lineage_events` table get pretty large. Oftentimes, we only want to search for runs within a given time frame (e.g., the last week). We should consider partitioning...

In the `OpenLineageService`, we construct a `DatasetVersionId` using the `uuid` property of the record - that is, the primary key. However, in the `RunDao`, when we construct the `DatasetVersionId` of...

bug

The change in https://github.com/MarquezProject/marquez/pull/1593 made the `marquez-api` jar incompatible with code that had depended on the `LineageEvent` class and its related classes. Any code that depended on those models must...

Some time ago, the OpenLineage spec was changed to include [`outputFacets`](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L160) and [`inputFacets`](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L134) in the `Dataset`s reported by a `RunEvent`. Several of the integrations, including the Spark integration and the...

good first issue

OpenLineage is going to support adding dataset and (possibly) job metadata outside of the context of a run (e.g., team ownership of a dataset, etc.). Marquez will need to be...

The Marquez client model is pretty out of sync from the server-side model. Ideally, both the client and server-side models should be generated from the open api spec and always...

client/java
client/python

### Problem Prior to https://github.com/OpenLineage/OpenLineage/pull/1037 , all Airflow OpenLineage events reported their parent runs as `parentRun` rather than the spec-defined `parent` facet. This change adds an alias to support those...

docs
api