marquez icon indicating copy to clipboard operation
marquez copied to clipboard

Bug: 404 error on retrieving /api/v1/lineage

Open OleksandrDvornik opened this issue 3 years ago • 8 comments

When we manually adding a dataset and then fetch data via lineage endpoint, on UI we have "Something went wrong while fetching lineage". When we hit lineage endpoint we expect that job already exist and trying to fetch jobId, as we didn't have job yet (only dataset), we return 404.

Requests to reproduce:

--Create a namespace
PUT http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432
Content-Type: application/json

{
  "ownerName": "Me"
}

###
--Create a source
PUT http://localhost:5000/api/v1/sources/postgres%3A%2F%2Flocalhost%3A6432
Content-Type: application/json

{
  "type": "DB_TABLE",
  "connectionUrl": "postgres://localhost:6432"
}

###

--Create a dataset
PUT http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432/datasets/dvdrental.public.actor_info
Content-Type: application/json

{
  "type": "DB_TABLE",
  "physicalName": "dvdrental.public.actor_info",
  "sourceName": "postgres://localhost:6432",
  "fields": [
    {
      "name": "value",
      "type": "string",
      "nullable": true,
      "metadata": {}
    }
  ]
}

###
--Retrieve info
GET http://localhost:3000/api/v1/lineage/?nodeId=dataset:postgres://localhost:6432:dvdrental.public.actor_info

404 response 

OleksandrDvornik avatar Sep 29 '21 12:09 OleksandrDvornik

@OleksandrDvornik Nothing to add here. I think you just describe the issue perfectly ! Thank you for raising this.

fbraza avatar Sep 29 '21 19:09 fbraza

It appears that there are some special characters that can cause us problems with the way parse out the nodeId from the query parameters.

We won't be able to transmit this http://localhost:3000/api/v1/lineage/?nodeId=dataset:postgres://localhost:6432:dvdrental.public.actor_info as is because we have no way of knowing what is a delimiter and what is part of the dataset name or namespace.

I will open up an issue to fix this in our web project that will encode the segments of the nodeId that contain names.

phixMe avatar Sep 29 '21 19:09 phixMe

I think the core issue is here: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/LineageService.java#L40-L43 . The LineageService only queries jobs to determine lineage- if the node in the query param is a dataset, it'll find the first job connected to that dataset, then determine lineage. If there's a dataset with no jobs, it'll throw that NodeIdNotFoundException, which will end up returning a 404.

collado-mike avatar Sep 29 '21 22:09 collado-mike

@collado-mike Hi many thanks for your feedback. Following your reasoning and @OleksandrDvornik advises I did tie a job to the dataset using the python client and the create_job() function. Unfortunately I still get the same error suggesting that adding a job is not enough here.

fbraza avatar Sep 30 '21 09:09 fbraza

I've looked dipper at workaround, so it's more complex than only adding a job Workaround steps:

  1. Create a job with interested dataset
  2. Create a run for that job
  3. Mark run from "step 2" as complete Then you should be able to get a response

OleksandrDvornik avatar Sep 30 '21 13:09 OleksandrDvornik

Working example:

PUT http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432
Content-Type: application/json

{
  "ownerName": "Me"
}

###

PUT http://localhost:5000/api/v1/sources/postgres%3A%2F%2Flocalhost%3A6432
Content-Type: application/json

{
  "type": "DB_TABLE",
  "connectionUrl": "postgres://localhost:6432"
}

###


PUT http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432/datasets/dvdrental.public.actor_info
Content-Type: application/json

{
  "type": "DB_TABLE",
  "physicalName": "dvdrental.public.actor_info",
  "sourceName": "postgres://localhost:6432",
  "fields": [
    {
      "name": "value",
      "type": "string",
      "nullable": true,
      "metadata": {}
    }
  ]
}

###

PUT http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432/jobs/dvdrental.public.actor_info
Content-Type: application/json

{
  "type": "BATCH",
  "inputs": [{
    "namespace": "postgres://localhost:6432",
    "name": "dvdrental.public.actor_info"
  }],
  "outputs": []
}

###

POST http://localhost:5000/api/v1/namespaces/postgres%3A%2F%2Flocalhost%3A6432/jobs/dvdrental.public.actor_info/runs
Content-Type: application/json

{}

###
POST http://localhost:5000/api/v1/jobs/runs/07556634-05ef-4b98-96ea-fd1aea180dff/complete
Content-Type: application/json

###

GET http://localhost:5000/api/v1/lineage/?nodeId=dataset:postgres://localhost:6432:dvdrental.public.actor_info

###

OleksandrDvornik avatar Oct 04 '21 12:10 OleksandrDvornik

I also stumbled over this while trying to comply to the OpenLineage Spec Naming conventions. This issue might also relate to #1761

error418 avatar Apr 05 '22 13:04 error418

Turns out the problem was located in one of the reverse proxies of the cluster. In this case it was a trailing slash in the NGINX proxy_pass directive, causing NGINX to decode the encoded path params before passing it to Marquez.

error418 avatar Apr 06 '22 18:04 error418

Thanks for updating us on the resolution, @error418!

wslulciuc avatar Jan 17 '23 11:01 wslulciuc