Merge query issues for tables `datasets`, `dataset_versions`, and `job_versions`
Problem
Table: dataset_versions
https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetVersionDao.java#L295
ERROR: duplicate key value violates unique constraint "dataset_versions_dataset_uuid_version_key"
DETAIL: Key (dataset_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* DatasetVersionDao.upsert */
INSERT INTO dataset_versions (
uuid,
created_at,
dataset_uuid,
version,
run_uuid,
fields,
namespace_name,
dataset_name,
lifecycle_state
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) ON **CONFLICT(version)** DO UPDATE SET run_uuid = EXCLUDED.run_uuid RETURNING *
Broken since version 0.12.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid) Based on the schema which exists since V1__intial_schema.sql
CREATE TABLE dataset_versions (
uuid UUID PRIMARY KEY,
created_at TIMESTAMP NOT NULL,
dataset_uuid UUID REFERENCES datasets(uuid),
version UUID NOT NULL,
run_uuid UUID,
**UNIQUE (dataset_uuid, version)**
);
Table: datasets
https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetDao.java#L191
ERROR: duplicate key value violates unique constraint "datasets_namespace_uuid_name_key"
DETAIL: Key (namespace_uuid, name)=(<uuid>, <name>) already exists.
STATEMENT: /* DatasetDao.upsert */ INSERT INTO datasets (
uuid,
type,
created_at,
updated_at,
namespace_uuid,
namespace_name,
source_uuid,
source_name,
name,
physical_name,
description,
is_deleted,
is_hidden
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9,
$10,
$11,
$12,
false
)
**ON CONFLICT (uuid)**
DO UPDATE SET
type = EXCLUDED.type,
updated_at = EXCLUDED.updated_at,
physical_name = EXCLUDED.physical_name,
description = EXCLUDED.description,
is_deleted = EXCLUDED.is_deleted,
is_hidden = EXCLUDED.is_hidden
RETURNING *
Broken since version 0.27.0 PR: https://github.com/MarquezProject/marquez/pull/2087 PR DIFF: https://github.com/MarquezProject/marquez/pull/2087/files#diff-c687ca258e023847591c9b8c044adddaae1e7c1f6f9483928bf2871050cf6a63R232
Current constraint marquez.public.datasets.datasets_namespace_uuid_name_key (namespace_uuid, name), but INSERT statement ON CONFLICT only checks for (uuid)
Table: job_versions
https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/JobVersionDao.java#L169
ERROR: duplicate key value violates unique constraint "job_versions_job_uuid_version_key"
DETAIL: Key (job_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* JobVersionDao.upsertJobVersion */ INSERT INTO job_versions (
uuid,
created_at,
updated_at,
job_uuid,
location,
version,
job_name,
namespace_uuid,
namespace_name
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9)
**ON CONFLICT(version)**
DO
UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *
Broken since version 0.13.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid) Based on the schema which exists since V1__intial_schema.sql
CREATE TABLE job_versions (
uuid UUID PRIMARY KEY,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
job_uuid UUID REFERENCES jobs(uuid),
version UUID NOT NULL,
location VARCHAR(255) NOT NULL,
latest_run_uuid UUID,
**UNIQUE (job_uuid, version)**
);
Proposal
For the three tables discussed above, either code needs to be adjusted to account for missing columns when looking for conflicts or migration needs to be applied to modify the unique constraint to adjust to the code.
Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!
Please direct me to the template, I couldn't find one. I looked that REAME.md or Contributing.md. At the moment I have followed a template used by one of the issues.