duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

Does duckdb_constraints() contain all relevant information?

Open krlmlr opened this issue 3 years ago • 16 comments

What happens?

In the example below, it seems that only the constraint_text column contains all information necessary to recreate a foreign key, in particular if it points to a UNIQUE index and not to a primary key. I'm missing:

  • the table being referenced
  • the constraint in the referenced tabled used as PK or unique constraint

I also wonder if you'd like to support information_schema.table_constraints, information_schema.key_column_usage and information_schema.constraint_column_usage . Especially the last view is different in all DBMS I've looked at, but I think there's a "correct" version very close to the Postgres implementation. (I believe the latter is broken in some corner cases, we can do better here.)

To Reproduce

CREATE TEMPORARY TABLE tf_1 (
  a integer, "b c" integer,
  PRIMARY KEY (a),
  UNIQUE ("b c")
);
CREATE TEMPORARY TABLE tf_2 (
  c integer, d integer, e integer,
  PRIMARY KEY (c),
  FOREIGN KEY (d) REFERENCES tf_1 (a),
  FOREIGN KEY (e) REFERENCES tf_1 ("b c")
);

SELECT * FROM duckdb_constraints();
┌─────────────┬────────────┬────────────┬───────────┬──────────────────┬─────────────────┬────────────────────────────────────────┬────────────┬───────────────────────────┬─────────────────────────┐
│ schema_name │ schema_oid │ table_name │ table_oid │ constraint_index │ constraint_type │            constraint_text             │ expression │ constraint_column_indexes │ constraint_column_names │
├─────────────┼────────────┼────────────┼───────────┼──────────────────┼─────────────────┼────────────────────────────────────────┼────────────┼───────────────────────────┼─────────────────────────┤
│ temp        │ 1207       │ tf_2       │ 1262      │ 0                │ PRIMARY KEY     │ PRIMARY KEY(c)                         │            │ [0]                       │ [c]                     │
│ temp        │ 1207       │ tf_2       │ 1262      │ 1                │ FOREIGN KEY     │ FOREIGN KEY (d) REFERENCES tf_1(a)     │            │ [1]                       │ [d]                     │
│ temp        │ 1207       │ tf_2       │ 1262      │ 2                │ FOREIGN KEY     │ FOREIGN KEY (e) REFERENCES tf_1("b c") │            │ [2]                       │ [e]                     │
│ temp        │ 1207       │ tf_2       │ 1262      │ 3                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [c]                     │
│ temp        │ 1207       │ tf_1       │ 1267      │ 0                │ PRIMARY KEY     │ PRIMARY KEY(a)                         │            │ [0]                       │ [a]                     │
│ temp        │ 1207       │ tf_1       │ 1267      │ 1                │ UNIQUE          │ UNIQUE("b c")                          │            │ [1]                       │ [b c]                   │
│ temp        │ 1207       │ tf_1       │ 1267      │ 2                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
│ temp        │ 1207       │ tf_1       │ 1267      │ 3                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
│ temp        │ 1207       │ tf_1       │ 1267      │ 4                │ FOREIGN KEY     │                                        │            │ [2]                       │ []                      │
│ temp        │ 1207       │ tf_1       │ 1267      │ 5                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
└─────────────┴────────────┴────────────┴───────────┴──────────────────┴─────────────────┴────────────────────────────────────────┴────────────┴───────────────────────────┴─────────────────────────┘

Steps to reproduce the behavior. Bonus points if those are only SQL queries.

Environment (please complete the following information):

  • OS: macOS
  • DuckDB Version: 4d39f040c115177d7d00b5dd26b3306193f90149
  • DuckDB Client: CLI

Before Submitting

  • [x] Have you tried this on the latest master branch?
  • Python: pip install duckdb --upgrade --pre
  • R: install.packages("https://github.com/duckdb/duckdb/releases/download/master-builds/duckdb_r_src.tar.gz", repos = NULL)
  • Other Platforms: You can find binaries here or compile from source.
  • [x] Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

krlmlr avatar Jul 04 '22 04:07 krlmlr

CC @hannes.

krlmlr avatar Jul 11 '22 11:07 krlmlr

Expanding the duckdb_constraints to include more information seems fine to me. Could you submit a PR?

Mytherin avatar Jul 11 '22 13:07 Mytherin

Still in c0af3bf87c45f1f918adee75004376eaf2041fb9. Taking a look now.

krlmlr avatar Aug 04 '22 06:08 krlmlr

Proposing new column names: reference_table_name, reference_table_oid, reference_column_indexes, reference_column_names .

krlmlr avatar Aug 05 '22 12:08 krlmlr

Sounds good to me

Mytherin avatar Aug 05 '22 12:08 Mytherin

I only now noticed the second-to-last row in the example: constraint_index = 4 . This row doesn't seem quite right.

An alternative approach to adding more columns is to add more rows, I think this particular row be expanded to contain the necessary information. From my experiments with various INFORMATION_SCHEMA variants, the "longer" form here (with more rows) is preferable because it's easier to analyze. (Only MySQL goes "wide" here, it's a bit of a pain.)

I'll experiment with slightly different examples to see how this affects this one bogus row.

krlmlr avatar Aug 06 '22 07:08 krlmlr

Is it a requirement that constraint_index values are dense and start at 0 for each tables? Or could they be unique within the returned table?

krlmlr avatar Aug 06 '22 07:08 krlmlr

With krlmlr/duckdb@14d5652e4e3c398008946f9edcddef28d073f1f1 and the following modified input file, I'm seeing:

CREATE TABLE tf_1 (
  a integer, "b c" integer, "d e" integer,
  PRIMARY KEY (a),
  UNIQUE ("b c"),
  UNIQUE ("d e")
);
CREATE TABLE tf_3 (
  g integer, h integer,
  PRIMARY KEY (g),
  UNIQUE (h)
);
CREATE TABLE tf_2 (
  c integer, d integer, e integer, f integer, g integer,
  PRIMARY KEY (c),
  FOREIGN KEY (d) REFERENCES tf_1 (a),
  FOREIGN KEY (e) REFERENCES tf_1 ("b c"),
  FOREIGN KEY (f) REFERENCES tf_1 ("d e"),
  FOREIGN KEY (g) REFERENCES tf_3 (g),
);
CREATE TABLE tf_4 (
  h integer,
  FOREIGN KEY (h) REFERENCES tf_3 (h),
);

SELECT * FROM duckdb_constraints() WHERE constraint_type IN ('FOREIGN KEY', 'REFERENCED KEY');
┌─────────────┬────────────┬────────────┬───────────┬──────────────────┬─────────────────┬────────────────────────────────────────┬────────────┬───────────────────────────┬─────────────────────────┐
│ schema_name │ schema_oid │ table_name │ table_oid │ constraint_index │ constraint_type │            constraint_text             │ expression │ constraint_column_indexes │ constraint_column_names │
├─────────────┼────────────┼────────────┼───────────┼──────────────────┼─────────────────┼────────────────────────────────────────┼────────────┼───────────────────────────┼─────────────────────────┤
│ main        │ 1          │ tf_4       │ 1294      │ 0                │ FOREIGN KEY     │ FOREIGN KEY (h) REFERENCES tf_3(h)     │            │ [0]                       │ [h]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 1                │ FOREIGN KEY     │ FOREIGN KEY (d) REFERENCES tf_1(a)     │            │ [1]                       │ [d]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 2                │ FOREIGN KEY     │ FOREIGN KEY (e) REFERENCES tf_1("b c") │            │ [2]                       │ [e]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 3                │ FOREIGN KEY     │ FOREIGN KEY (f) REFERENCES tf_1("d e") │            │ [3]                       │ [f]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 4                │ FOREIGN KEY     │ FOREIGN KEY (g) REFERENCES tf_3(g)     │            │ [4]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 3                │ REFERENCED KEY  │                                        │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 5                │ REFERENCED KEY  │                                        │            │ [1]                       │ [h]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 6                │ REFERENCED KEY  │                                        │            │ [2]                       │ [d e]                   │
└─────────────┴────────────┴────────────┴───────────┴──────────────────┴─────────────────┴────────────────────────────────────────┴────────────┴───────────────────────────┴─────────────────────────┘

The "REFERENCED KEY" constraint type is new. I think this now has almost all data, except:

  • each "FOREIGN KEY" row should have a corresponding "REFERENCED KEY" row, from what I can tell this is an error in the underlying catalog entries
  • we would need to link corresponding "FOREIGN KEY" and "REFERENCED KEY" rows, either through identical values in constraint_index (which would forfeit the current denseness and zero-indexing, see my most recent question), or through a new column

Should I file a new issue for the first bullet point? Has this come up already?

Putting this aside for now, awaiting input. CC @hannes.

krlmlr avatar Aug 06 '22 08:08 krlmlr

each "FOREIGN KEY" row should have a corresponding "REFERENCED KEY" row, from what I can tell this is an error in the underlying catalog entries

If this is an issue in the underlying table catalog entries the foreign key itself should not function, since those catalog entries are used directly to check for the presence of foreign key constraints. Could you verify that the missing entry works as intended?

Mytherin avatar Aug 06 '22 09:08 Mytherin

I filed #4309.

I can continue working on this issue here if we decide that we want "wide form" (without new columns or with just one new column). I still think that "long form" is the better pick, that would require waiting for #4309.

krlmlr avatar Aug 06 '22 10:08 krlmlr

Are we okay with the following modification that affects the constraint_index column, krlmlr/duckdb@009ac70820a12868c65bf953bf2d6990e50fa1e3?

Moving forward, a FOREIGN KEY row would have a corresponding REFERENCED KEY row with the same constraint_index, for all other constraint types that column would be unique. The alternative is a new column for only this purpose, leaving constraint_index unchanged. All this is for "long" form; for "wide" form, we would add four new columns and always leave constraint_index unchanged, and we would perhaps remove or edit the REFERENCED KEY entries.

Awaiting your input.

CREATE TABLE tf_1 (
  a integer, "b c" integer, "d e" integer,
  PRIMARY KEY (a),
  UNIQUE ("b c"),
  UNIQUE ("d e")
);
CREATE TABLE tf_3 (
  g integer, h integer,
  PRIMARY KEY (g),
  UNIQUE (h)
);
CREATE TABLE tf_2 (
  c integer, d integer, e integer, f integer, g integer,
  PRIMARY KEY (c),
  FOREIGN KEY (d) REFERENCES tf_1 (a),
  FOREIGN KEY (e) REFERENCES tf_1 ("b c"),
  FOREIGN KEY (f) REFERENCES tf_1 ("d e"),
  FOREIGN KEY (g) REFERENCES tf_3 (g),
);
CREATE TABLE tf_4 (
  h integer,
  FOREIGN KEY (h) REFERENCES tf_3 (h),
);

SELECT * FROM duckdb_constraints();
┌─────────────┬────────────┬────────────┬───────────┬──────────────────┬─────────────────┬────────────────────────────────────────┬────────────┬───────────────────────────┬─────────────────────────┐
│ schema_name │ schema_oid │ table_name │ table_oid │ constraint_index │ constraint_type │            constraint_text             │ expression │ constraint_column_indexes │ constraint_column_names │
├─────────────┼────────────┼────────────┼───────────┼──────────────────┼─────────────────┼────────────────────────────────────────┼────────────┼───────────────────────────┼─────────────────────────┤
│ main        │ 1          │ tf_4       │ 1294      │ 0                │ FOREIGN KEY     │ FOREIGN KEY (h) REFERENCES tf_3(h)     │            │ [0]                       │ [h]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 1                │ PRIMARY KEY     │ PRIMARY KEY(c)                         │            │ [0]                       │ [c]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 2                │ FOREIGN KEY     │ FOREIGN KEY (d) REFERENCES tf_1(a)     │            │ [1]                       │ [d]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 3                │ FOREIGN KEY     │ FOREIGN KEY (e) REFERENCES tf_1("b c") │            │ [2]                       │ [e]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 4                │ FOREIGN KEY     │ FOREIGN KEY (f) REFERENCES tf_1("d e") │            │ [3]                       │ [f]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 5                │ FOREIGN KEY     │ FOREIGN KEY (g) REFERENCES tf_3(g)     │            │ [4]                       │ [g]                     │
│ main        │ 1          │ tf_2       │ 1284      │ 6                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [c]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 7                │ PRIMARY KEY     │ PRIMARY KEY(g)                         │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 8                │ UNIQUE          │ UNIQUE(h)                              │            │ [1]                       │ [h]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 9                │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 10               │ REFERENCED KEY  │                                        │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 11               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 12               │ REFERENCED KEY  │                                        │            │ [1]                       │ [h]                     │
│ main        │ 1          │ tf_3       │ 1297      │ 13               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [g]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 14               │ PRIMARY KEY     │ PRIMARY KEY(a)                         │            │ [0]                       │ [a]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 15               │ UNIQUE          │ UNIQUE("b c")                          │            │ [1]                       │ [b c]                   │
│ main        │ 1          │ tf_1       │ 1291      │ 16               │ UNIQUE          │ UNIQUE("d e")                          │            │ [2]                       │ [d e]                   │
│ main        │ 1          │ tf_1       │ 1291      │ 17               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 18               │ REFERENCED KEY  │                                        │            │ [0]                       │ [a]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 19               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 20               │ REFERENCED KEY  │                                        │            │ [1]                       │ [b c]                   │
│ main        │ 1          │ tf_1       │ 1291      │ 21               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
│ main        │ 1          │ tf_1       │ 1291      │ 22               │ REFERENCED KEY  │                                        │            │ [2]                       │ [d e]                   │
│ main        │ 1          │ tf_1       │ 1291      │ 23               │ NOT NULL        │ NOT NULL                               │            │ [0]                       │ [a]                     │
└─────────────┴────────────┴────────────┴───────────┴──────────────────┴─────────────────┴────────────────────────────────────────┴────────────┴───────────────────────────┴─────────────────────────┘

krlmlr avatar Aug 09 '22 07:08 krlmlr

I'm fine with that

Mytherin avatar Aug 09 '22 07:08 Mytherin

Can you please confirm: we are not introducing any new column names, we are representing the constraints in long form, and all that remains to do is to link the two corresponding rows in the returned table?

Am 09.08.2022 um 09:55 schrieb Mark @.***>:

 I'm fine with that

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.

krlmlr avatar Aug 09 '22 08:08 krlmlr

I don't mind either way - the wide form might be a bit easier to use perhaps? Adding extra columns to this table is not a problem.

Mytherin avatar Aug 09 '22 08:08 Mytherin

Fine, I'll think about it a bit more.

Are we ever going to support foreign key constraints across schemas?

create schema a;
create schema b;
create table a.x (c1 integer, primary key (c1));
create table b.y (c1 integer, foreign key (c1) references a.x (c1));
Error: near line 4: Binder Error: Creating foreign keys across different schemas is not supported

krlmlr avatar Aug 09 '22 09:08 krlmlr

Yes, likely we will support this in the future

Mytherin avatar Aug 09 '22 09:08 Mytherin

I'm making good progress, but struggling with default schemas.

Ideally, if the assertion introduced in krlmlr/duckdb@e7c0e134e5411acaea5fbca82ae46e01046d6bd3 was always fulfilled, we could rely on schema and table name to be always accurate. It's a bit of a rabbit hole, though.

Does BoundForeignKeyConstraint always have accurate schema information?

krlmlr avatar Aug 10 '22 14:08 krlmlr

Currently, BoundForeignKeyConstraint doesn't contain schema information. I wonder if we should move towards fulfilling the assertion in https://github.com/krlmlr/duckdb/commit/e7c0e134e5411acaea5fbca82ae46e01046d6bd3 .

krlmlr avatar Aug 10 '22 16:08 krlmlr