marquez icon indicating copy to clipboard operation
marquez copied to clipboard

Last DocumentationJobFacet received is stored (even if it is null)

Open error418 opened this issue 2 years ago • 2 comments

I am currently experimenting with Marquez and OpenLineage. Just wanted to check back, if this behaviour is intended (or it is a bug):

Steps to reproduce

  1. ✔️ Sending Lineage Job START Event with DocumentationJobFacet containing information
  2. ✔️ Marquez displays the documentation contents in the Web UI for the Job
  3. ✔️ Sending Lineage Job COMPLETED Event containing no DocumentationJobFacet
  4. ❔ Marquez deletes the previously provided documentation, displaying no documentation contents

Observations

It looks like the following Upsert

https://github.com/MarquezProject/marquez/blob/b58363314234a092c49328a24896c3aad66984a1/api/src/main/java/marquez/db/OpenLineageDao.java#L125-L136

causes the last sent DocumentationJobFacet to win, even if it is null.

With this behaviour one needs to send the DocumentationJobFacet for every request made to the OpenLineage API backend (probably also including OTHER Events) to ensure the documentation is always visible in Marquez.

error418 avatar Apr 06 '22 18:04 error418

Thanks @error418 for the detailed write up on the issue, and for the great question. I think we can view descriptions (not just for jobs, but also for dataset, sources, etc) as a unique case (given that it's optional) and would expect the description to persist and not be overwritten by null if previously present. That said, how we handle nulls vs a value just not being present is a separate decision.

Given that OpenLineage events are additive, meaning, not all metadata needs to be provided if already sent in an earlier event for a given runID, we should update how we hand descriptions for ON CONFLICT in our upset statements as follows:

description = COALESCE(EXCLUDED.description, :description)

The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display...

Is this a fix you'd like to contribute?

wslulciuc avatar Apr 06 '22 20:04 wslulciuc

Similar issue/PR: https://github.com/MarquezProject/marquez/pull/1718/files

mobuchowski avatar Apr 11 '22 09:04 mobuchowski