condenser icon indicating copy to clipboard operation
condenser copied to clipboard

subset_downstream generates incorrect SELECT queries if target table has multiple fk references

Open mjvotaw opened this issue 3 years ago • 3 comments

I noticed this just now while testing this tool against a client's database. They have several tables that each contain 2 columns that are referenced in foreign key relationships. Consider the following tables:

CREATE TABLE IF NOT EXISTS public."user"
(
    id integer NOT NULL,
    other_key uuid NOT NULL,
    CONSTRAINT user_pkey PRIMARY KEY (id),
    CONSTRAINT user_unique_key UNIQUE (other_key)
)

CREATE TABLE IF NOT EXISTS public.some_table
(
    id integer NOT NULL,
    user_id integer,
    CONSTRAINT some_entry_pkey PRIMARY KEY (id),
    CONSTRAINT "some_entry-user-pk" FOREIGN KEY (user_id)
        REFERENCES public."user" (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

CREATE TABLE IF NOT EXISTS public.another_table
(
    id integer NOT NULL,
    user_uuid uuid,
    CONSTRAINT another_table_pkey PRIMARY KEY (id),
    CONSTRAINT another_table_user_uuid_fkey FOREIGN KEY (user_uuid)
        REFERENCES public."user" (other_key) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

subset_downstream generates the following query:

SELECT "user_id" FROM "public"."some_table" WHERE ("user_id") NOT IN (SELECT "other_key" FROM "public"."user")

Which then throws the following error:

psycopg2.errors.UndefinedFunction: operator does not exist: integer = uuid
LINE 1: ..._id" FROM "public"."some_table" WHERE ("user_id") NOT IN (SE...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

It looks like the issue stems from the fact that pk_columns is only set once, instead of being set for each object in referencing_tables. This might be a naive solution, but simply adding pk_columns = r['target_columns'] before the query is built results in the data being properly copied to the destination db.

mjvotaw avatar Oct 21 '21 21:10 mjvotaw

Thanks for sending this our way — it's actually a little bit more complicated than however. Notably, you could have a world in which different foreign key constraints have different ways of uniquely identifying a row in a target table. For example:

  • target table: id1, id2
  • fk_1 table: fk1 -> id1
  • fk_2 table: (fk1, fk2) -> id1, id2
  • fk_3 table fk2 -> id2

Unfortunately condenser is not built for these more complicated schemas, unlike Tonic's built-in subsetter which handles your case, the above case, and an additional variety of others. We'd happily look at a pull request however!

theaeolianmachine avatar Nov 04 '21 18:11 theaeolianmachine

Been assessing the tool for use with our stack and also ran into this issue. Just wondering if this issue also occurs with the premium database subsetter? Thanks!

keenanwells-tatari avatar Feb 15 '23 23:02 keenanwells-tatari

Yes! The premium database subsetter does indeed handle this case, as well as many other more complicated situations with compound keys, cycles, etc.

If you're interested in exploring our premium tool, you can test it out in a free trial at this link.

bricct avatar Mar 07 '23 18:03 bricct