marquez icon indicating copy to clipboard operation
marquez copied to clipboard

Merge query issues for tables `datasets`, `dataset_versions`, and `job_versions`

Open ddave09 opened this issue 2 years ago • 2 comments

Problem

Table: dataset_versions

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetVersionDao.java#L295

ERROR: duplicate key value violates unique constraint "dataset_versions_dataset_uuid_version_key"
DETAIL: Key (dataset_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* DatasetVersionDao.upsert */
INSERT INTO dataset_versions (
uuid, 
created_at, 
dataset_uuid, 
version, 
run_uuid, 
fields, 
namespace_name, 
dataset_name, 
lifecycle_state
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) ON **CONFLICT(version)** DO UPDATE SET run_uuid = EXCLUDED.run_uuid RETURNING *

Broken since version 0.12.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid) Based on the schema which exists since V1__intial_schema.sql

CREATE TABLE dataset_versions (
  uuid         UUID PRIMARY KEY,
  created_at   TIMESTAMP NOT NULL,
  dataset_uuid UUID REFERENCES datasets(uuid),
  version      UUID NOT NULL,
  run_uuid     UUID,
  **UNIQUE (dataset_uuid, version)**
);

Table: datasets

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetDao.java#L191

ERROR: duplicate key value violates unique constraint "datasets_namespace_uuid_name_key"
DETAIL: Key (namespace_uuid, name)=(<uuid>, <name>) already exists.
STATEMENT: /* DatasetDao.upsert */ INSERT INTO datasets (
uuid,
type,
created_at,
updated_at,
namespace_uuid,
namespace_name,
source_uuid,
source_name,
name,
physical_name,
description,
is_deleted,
is_hidden
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9,
$10,
$11,
$12,
false
)
**ON CONFLICT (uuid)**
DO UPDATE SET
type = EXCLUDED.type,
updated_at = EXCLUDED.updated_at,
physical_name = EXCLUDED.physical_name,
description = EXCLUDED.description,
is_deleted = EXCLUDED.is_deleted,
is_hidden = EXCLUDED.is_hidden
RETURNING *

Broken since version 0.27.0 PR: https://github.com/MarquezProject/marquez/pull/2087 PR DIFF: https://github.com/MarquezProject/marquez/pull/2087/files#diff-c687ca258e023847591c9b8c044adddaae1e7c1f6f9483928bf2871050cf6a63R232

Current constraint marquez.public.datasets.datasets_namespace_uuid_name_key (namespace_uuid, name), but INSERT statement ON CONFLICT only checks for (uuid)

Table: job_versions

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/JobVersionDao.java#L169

ERROR: duplicate key value violates unique constraint "job_versions_job_uuid_version_key"
DETAIL: Key (job_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* JobVersionDao.upsertJobVersion */ INSERT INTO job_versions (
uuid,
created_at,
updated_at,
job_uuid,
location,
version,
job_name,
namespace_uuid,
namespace_name
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9)
**ON CONFLICT(version)**
DO
UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *

Broken since version 0.13.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid) Based on the schema which exists since V1__intial_schema.sql

CREATE TABLE job_versions (
  uuid            UUID PRIMARY KEY,
  created_at      TIMESTAMP NOT NULL,
  updated_at      TIMESTAMP NOT NULL,
  job_uuid        UUID REFERENCES jobs(uuid),
  version         UUID NOT NULL,
  location        VARCHAR(255) NOT NULL,
  latest_run_uuid UUID,
  **UNIQUE (job_uuid, version)**
);

Proposal

For the three tables discussed above, either code needs to be adjusted to account for missing columns when looking for conflicts or migration needs to be applied to modify the unique constraint to adjust to the code.

ddave09 avatar Oct 30 '23 03:10 ddave09

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

boring-cyborg[bot] avatar Oct 30 '23 03:10 boring-cyborg[bot]

Please direct me to the template, I couldn't find one. I looked that REAME.md or Contributing.md. At the moment I have followed a template used by one of the issues.

ddave09 avatar Oct 30 '23 03:10 ddave09