marquez
marquez copied to clipboard
Last DocumentationJobFacet received is stored (even if it is null)
I am currently experimenting with Marquez and OpenLineage. Just wanted to check back, if this behaviour is intended (or it is a bug):
Steps to reproduce
- ✔️ Sending Lineage Job
START
Event with DocumentationJobFacet containing information - ✔️ Marquez displays the documentation contents in the Web UI for the Job
- ✔️ Sending Lineage Job
COMPLETED
Event containing no DocumentationJobFacet - ❔ Marquez deletes the previously provided documentation, displaying no documentation contents
Observations
It looks like the following Upsert
https://github.com/MarquezProject/marquez/blob/b58363314234a092c49328a24896c3aad66984a1/api/src/main/java/marquez/db/OpenLineageDao.java#L125-L136
causes the last sent DocumentationJobFacet
to win, even if it is null
.
With this behaviour one needs to send the DocumentationJobFacet for every request made to the OpenLineage API backend (probably also including OTHER
Events) to ensure the documentation is always visible in Marquez.
Thanks @error418 for the detailed write up on the issue, and for the great question. I think we can view descriptions (not just for jobs, but also for dataset, sources, etc) as a unique case (given that it's optional) and would expect the description to persist and not be overwritten by null
if previously present. That said, how we handle null
s vs a value just not being present is a separate decision.
Given that OpenLineage events are additive, meaning, not all metadata needs to be provided if already sent in an earlier event for a given runID
, we should update how we hand descriptions for ON CONFLICT
in our upset statements as follows:
description = COALESCE(EXCLUDED.description, :description)
The
COALESCE
function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display...
Is this a fix you'd like to contribute?
Similar issue/PR: https://github.com/MarquezProject/marquez/pull/1718/files