marquez Job is incorrectly displayed on UI if input and output datasets have equal names but different namespaces

Job is incorrectly displayed on UI if input and output datasets have equal names but different namespaces

Open AndriiSushko opened this issue 4 years ago • 2 comments

Hi there,

Let's assume I have:

a dataset 'heroku_pg_work_api_prod.attachments' in namespace 'sds-ds-raw-data'.
a dataset with the same name 'heroku_pg_work_api_prod.attachments' but in another namespace 'sds-ds-raw-data-extracted-copy'
Finally, I want to create a job that represents the actual copying of data (as is) from the first dataset to another.

I'm using this POST to create a job:

{{baseUrl}}/namespaces/:namespace/jobs/:job

BODY

{
   "type":"BATCH",
   "inputs":[
      {
         "namespace":"sds-ds-raw-data",
         "name":"heroku_pg_work_api_prod.attachments"
      }
   ],
   "outputs":[
      {
         "namespace":"sds-ds-raw-data-extracted-copy",
         "name":"heroku_pg_work_api_prod.attachments"
      }
   ],
   "location":"https://github.com/my-jobs/blob/124f6089ad4c5fcbb1d7b33cbb5d3a9521c5d32c",
   "context":{
      "SQL":"SELECT * FROM mytable;"
   },
   "description":"My first job!"
}

I'd expect this job and datasets lineage to be displayed properly, but the actual UI result looks confusing:

Screenshot 2021-02-24 at 13 03 11

Environment:

MacOS Big Sur 11.2.1 Google Chrome 88.0.4324.182 (Official Build) (x86_64) OR Safari 14.0.3 (16610.4.3.1.4) Docker Engine v20.10.2 marquez-docker-compose.yaml

Feb 24 '21 11:02 AndriiSushko

Thank you @AndriiSushko for the report. I'm adding it to the roadmap

Jun 18 '21 17:06 julienledem

@AndriiSushko when we display the lineage graph (but also render the graph on the backend), we don't take into account the namespace the datasets is associate with but do think this make sense to support (maybe as a flag). So, it's not a bug, but rather a feature. I've updated the issues with our feature label. Also, I've outlined a possible solution in https://github.com/MarquezProject/marquez/issues/855.

Nov 28 '23 21:11 wslulciuc

marquez marquez copied to clipboard

Job is incorrectly displayed on UI if input and output datasets have equal names but different namespaces

BODY

Environment:

marquez
marquez copied to clipboard