marquez icon indicating copy to clipboard operation
marquez copied to clipboard

Dataset `currentVersion` not in Dataset Versions Listing

Open KevinMellott91 opened this issue 3 years ago • 3 comments

When retrieving the details of a Dataset, an attribute is included to specify the current version of that Dataset.

curl -s {baseUrl}/api/v1/namespaces/food_delivery/datasets/public.delivery_7_days | jq '.currentVersion'
"54e17e0b-b9da-43f2-8a7a-33d139625a18"

However, if you try to use this version identifier to retrieve a specific Dataset version, no records can be located.

curl -s {baseUrl}/api/v1/namespaces/food_delivery/datasets/public.delivery_7_days/versions/54e17e0b-b9da-43f2-8a7a-33d139625a18
{"code":404,"message":"Dataset version '54e17e0b-b9da-43f2-8a7a-33d139625a18' not found."}

The API call to display all versions for this Dataset does not include the currentVersion value obtained in the first step.

curl -s {baseUrl}/api/v1/namespaces/food_delivery/datasets/public.delivery_7_days/versions/ | jq '.versions | map(.version)'
[
  "337f23ac-e1b5-3bac-ade4-5bc5dd5312fb",
  "52afbdc4-d397-3208-8a22-e6cb0aaaaddc",
  "b1e5c54b-ed0b-37d9-b3fc-258d82d3fd1b",
  "246ba3cd-9920-3681-a488-9ae0c46fed4c",
  "48eb39b2-07b8-3ae0-a275-c56c4d460334",
  "0529933a-6a85-3090-b91f-db9f4fc19a07",
  "5623b1b7-02d3-3aed-81b0-ced013aa0d76",
  "bdeca942-a66e-3956-aea5-096cdcbe1705",
  "00ae6d5f-bdb9-3691-9849-15fdc9079622",
  "3e84ed80-ed8a-3a31-aa77-bd027fb72ac3",
  "8e46b91c-7cf6-3341-9842-830ed3c39918",
  "1181b17f-222c-3f97-9baf-1a6dce5f840a",
  "574ce8d4-6f26-34bd-a9ad-c0ea0abeea0a",
  "e9e55522-5501-3bc8-823d-72d839091850",
  "a30772dc-ed05-3f5e-baa7-66112b2caf96",
  "6a1c9760-3fba-3ea9-9379-0b5928af5302",
  "59cecc28-f97a-3123-8db1-79ee9f0f20f0",
  "2341448d-b186-34b0-abbd-7b78018096e8",
  "44372ad7-3216-3541-8422-9ac303ea89e6",
  "4b152261-1f35-3ae8-8ae0-e67c4f294abf",
  "de2fb8f9-7358-3a20-8656-f3e88ccca78f",
  "df242c88-7bf9-3b17-bfb3-f6fbc8aa6c59",
  "1947fe61-45e4-3866-90d0-1ca09d4b5339",
  "2f07307b-1b72-36d1-88ef-34764e9852e2",
  "7f4da3d1-1e63-3c35-b241-ca8dcce33a9c"
]

This can be reproduced by running a fresh Marquez install and attempting to lookup any of the seed data. On a related note, the Gitpods functionality that was recently added made reproducing this issue extremely easy!

KevinMellott91 avatar Feb 18 '22 22:02 KevinMellott91

I'm able to reproduce. I'd be happy to take this one.

RNHTTR avatar May 25 '22 02:05 RNHTTR

Related: https://github.com/MarquezProject/marquez/issues/1977

collado-mike avatar May 25 '22 02:05 collado-mike

There are a couple things going on that are causing some confusion:

  1. Currently the API for GETting a DatasetVersion gets version data from DatasetVersionDao. The queries in DatasetVersionDao use the field version. As discussed in the commentary on #1977, this field is supposed to be an internal identifier to allow any logic around the DatasetVersion UUID to change as needed without affecting the client. This is the source of the bug itself.
  2. Due to the confusion with version, the nomenclature currentVersion is misleading. The API for GETting a Dataset correctly gets the intended-to-be-publicly-facing UUID, but currentVersion remains confusing, and the bug from (1) initially makes this API appear to be the culprit.

Closing #1977 according to my comment there and updating Dataset.java to a clearer name (e.g. currentVersionUuid) would resolve this issue.

If this makes sense, I'll try and close this by the end of the week.

RNHTTR avatar Jun 07 '22 01:06 RNHTTR