marquez
marquez copied to clipboard
Job is incorrectly displayed on UI if input and output datasets have equal names but different namespaces
Hi there,
Let's assume I have:
- a dataset 'heroku_pg_work_api_prod.attachments' in namespace 'sds-ds-raw-data'.
- a dataset with the same name 'heroku_pg_work_api_prod.attachments' but in another namespace 'sds-ds-raw-data-extracted-copy'
- Finally, I want to create a job that represents the actual copying of data (as is) from the first dataset to another.
I'm using this POST to create a job:
{{baseUrl}}/namespaces/:namespace/jobs/:job
BODY
{
"type":"BATCH",
"inputs":[
{
"namespace":"sds-ds-raw-data",
"name":"heroku_pg_work_api_prod.attachments"
}
],
"outputs":[
{
"namespace":"sds-ds-raw-data-extracted-copy",
"name":"heroku_pg_work_api_prod.attachments"
}
],
"location":"https://github.com/my-jobs/blob/124f6089ad4c5fcbb1d7b33cbb5d3a9521c5d32c",
"context":{
"SQL":"SELECT * FROM mytable;"
},
"description":"My first job!"
}
I'd expect this job and datasets lineage to be displayed properly, but the actual UI result looks confusing:

Environment:
MacOS Big Sur 11.2.1 Google Chrome 88.0.4324.182 (Official Build) (x86_64) OR Safari 14.0.3 (16610.4.3.1.4) Docker Engine v20.10.2 marquez-docker-compose.yaml
Thank you @AndriiSushko for the report. I'm adding it to the roadmap
@AndriiSushko when we display the lineage graph (but also render the graph on the backend), we don't take into account the namespace the datasets is associate with but do think this make sense to support (maybe as a flag). So, it's not a bug, but rather a feature. I've updated the issues with our feature label. Also, I've outlined a possible solution in https://github.com/MarquezProject/marquez/issues/855.