pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

pgloader initiates command to create duplicate value on enum type

Open frbimo-adv opened this issue 3 years ago • 6 comments

I was trying to migrate database using command:

./build/bin/pgloader --debug pgsql://user:[email protected]:1921/mydb pgsql://user:[email protected]:1922/dbtest

but the process stopped with error:

...
2020-08-07T14:03:25.586000+08:00 DEBUG SET client_encoding TO 'utf8'
2020-08-07T14:03:26.068000+08:00 DEBUG SET application_name TO 'pgloader'
2020-08-07T14:03:26.871000+08:00 DEBUG BEGIN
2020-08-07T14:03:29.237000+08:00 SQL DROP TYPE IF EXISTS "public"."resource_type_enum" CASCADE;
2020-08-07T14:03:30.640000+08:00 SQL CREATE TYPE "public"."resource_type_enum" AS ENUM ('Unknown-Resource-Type', 'Unknown-Resource-Type', 'Dedicated-Single-Minimum', 'Dedicated-Single-Minimum', 'Dedicated-Single-Mini', 'Dedicated-Single-Mini', 'Dedicated-Single-Small', 'Dedicated-Single-Small', 'Dedicated-Single-Medium', 'Dedicated-Single-Medium', 'Dedicated-Single-Large', 'Dedicated-Single-Large', 'Dedicated-HA-Minimum', 'Dedicated-HA-Minimum', 'Dedicated-HA-Small', 'Dedicated-HA-Small', 'Dedicated-HA-Medium', 'Dedicated-HA-Medium', 'Dedicated-HA-Large', 'Dedicated-HA-Large');
2020-08-07T14:03:30.842000+08:00 ERROR Database error 23505: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(17012, Unknown-Resource-Type) already exists.
QUERY: CREATE TYPE "public"."resource_type_enum" AS ENUM ('Unknown-Resource-Type', 'Unknown-Resource-Type', 'Dedicated-Single-Minimum', 'Dedicated-Single-Minimum', 'Dedicated-Single-Mini', 'Dedicated-Single-Mini', 'Dedicated-Single-Small', 'Dedicated-Single-Small', 'Dedicated-Single-Medium', 'Dedicated-Single-Medium', 'Dedicated-Single-Large', 'Dedicated-Single-Large', 'Dedicated-HA-Minimum', 'Dedicated-HA-Minimum', 'Dedicated-HA-Small', 'Dedicated-HA-Small', 'Dedicated-HA-Medium', 'Dedicated-HA-Medium', 'Dedicated-HA-Large', 'Dedicated-HA-Large');
2020-08-07T14:03:31.019000+08:00 FATAL Failed to create the schema, see above.
2020-08-07T14:03:31.020000+08:00 LOG report summary reset
       table name     errors       read   imported      bytes      total time       read      write
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
  fetch meta data          0         19         19                     0.299s    
   Create Schemas          0          0          0                     0.476s    
 Create SQL Types          0          0          0                     0.000s    
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
2020-08-07T14:03:31.060000+08:00 INFO Stopping monitor

Seems pgloader initiates command to create enum type with duplicate values for each enum element.

I have checked the source database and confirmed that related enum type doesn't have duplicate values.

  • [X] pgloader --version
pgloader version 3.6.e388909
compiled with SBCL 1.4.5.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
  • [X] did you test a fresh compile from the source tree?

binary compiled from master branch

frbimo-adv avatar Aug 07 '20 06:08 frbimo-adv

Hi @frbimo-adv ; thanks for the bug report. Can you run the following SQL query on your source Postgres database and paste the result of it in this issue, please:

--
-- get user defined SQL types
--
  select nt.nspname,
         extname,
         typname,
         case when enum.enumtypid is not null
              then array_agg(enum.enumlabel order by enumsortorder)
          end as enumvalues

    from pg_class c
         join pg_namespace n on n.oid = c.relnamespace
         left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
         join pg_type t on t.oid = a.atttypid
         left join pg_namespace nt on nt.oid = t.typnamespace
         left join pg_depend d on d.classid = 'pg_type'::regclass
                              and d.refclassid = 'pg_extension'::regclass
                              and d.objid = t.oid
         left join pg_extension e on refobjid = e.oid
         left join pg_enum enum on enum.enumtypid = t.oid

   where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
         and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         and c.relkind in ('r', 'f', 'p')
         and
           (   t.typrelid = 0
            or
               (select c.relkind = 'c'
                 from pg_class c
                where c.oid = t.typrelid)
           )
           and not exists
             (
                select 1
                  from pg_type el
                 where el.oid = t.typelem
                   and el.typarray = t.oid
              )

group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;

dimitri avatar Aug 16 '20 18:08 dimitri

I have this same issue, running:

pgloader --version
pgloader version "3.6.2"
compiled with SBCL 2.0.11

I looked at the schema dump from pg_dump, and the enum is not duplicated, as far as I can tell:

--
-- Name: crop_classification; Type: TYPE; Schema: grain; Owner: type_editor
--

CREATE TYPE grain.crop_classification AS ENUM (
    '6RSF(H)',
    'DURUM',
    'SRS',
    'HRS',
    'HWS',
    'HRW',
    'TRITICALE',
    'SWW',
    'SWS',
    '2R2M',
    '2RSM',
    '2RSF',
    '2RSF(H)',
    '6RSF',
    '6RSM',
    '6RSN',
    'HWW'
);

However, pgloader gives a similar error as the OP:

2021-03-18T02:05:59.888436Z NOTICE Prepare PostgreSQL database.
2021-03-18T02:06:02.415649Z ERROR Database error 23505: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(320944, 6RSF(H)) already exists.
QUERY: CREATE TYPE "grain"."crop_classification" AS ENUM ('6RSF(H)', '6RSF(H)', 'DURUM', 'DURUM', 'SRS', 'SRS', 'HRS', 'HRS', 'HWS', 'HWS', 'HRW', 'HRW', 'TRITICALE', 'TRITICALE', 'SWW', 'SWW', 'SWS', 'SWS', '2R2M', '2R2M', '2RSM', '2RSM', '2RSF', '2RSF', '2RSF(H)', '2RSF(H)', '6RSF', '6RSF', '6RSM', '6RSM', '6RSN', '6RSN', 'HWW', 'HWW');
2021-03-18T02:06:02.415800Z FATAL Failed to create the schema, see above.
2021-03-18T02:06:02.415848Z LOG report summary reset
       table name     errors       read   imported      bytes      total time       read      write
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
      before load          0          1          1                     0.020s
  fetch meta data          0        176        176                     0.745s
   Create Schemas          0          0          0                     0.004s
 Create SQL Types          0          0          0                     0.000s
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------

Here's the output of running the query suggested above:

 nspname | extname |       typname       |                                                                                      enumvalues
---------+---------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 grain   |         | crop_classification | {6RSF(H),6RSF(H),DURUM,DURUM,SRS,SRS,HRS,HRS,HWS,HWS,HRW,HRW,TRITICALE,TRITICALE,SWW,SWW,SWS,SWS,2R2M,2R2M,2RSM,2RSM,2RSF,2RSF,2RSF(H),2RSF(H),6RSF,6RSF,6RSM,6RSM,6RSN,6RSN,HWW,HWW}
 grain   |         | crop_sub_type       | {COMMON,BARLEY,DURUM,TRITICALE,SPRINGWHEAT,WINTERWHEAT,OAT}
 grain   |         | prism_quality       | {stable,provisional,early,forecast}
 grain   |         | region              | {IR,IR,NonIR,NonIR}
 grain   |         | release_status      | {Released,Advanced,Available}
 grain   |         | trial_group         | {BARLEY,BARLEY,WHEAT,WHEAT,DURUM,DURUM,COMMON,COMMON,OAT,OAT}

I notice the duplication problem seems to be repeated for other enums, but not all. Thoughts?

grosa avatar Mar 18 '21 02:03 grosa

Hi, I solved this by removing order by enumsortorder from the array_agg function. Might be better to review the joins and find where it is duplicating, however I stopped researching since this solved the issue for what I needed.

Full code:

--
-- get user defined SQL types
--
  select nt.nspname,
         extname,
         typname,
         case when enum.enumtypid is not null
              then array_agg(enum.enumlabel)
          end as enumvalues

    from pg_class c
         join pg_namespace n on n.oid = c.relnamespace
         left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
         join pg_type t on t.oid = a.atttypid
         left join pg_namespace nt on nt.oid = t.typnamespace
         left join pg_depend d on d.classid = 'pg_type'::regclass
                              and d.refclassid = 'pg_extension'::regclass
                              and d.objid = t.oid
         left join pg_extension e on refobjid = e.oid
         left join pg_enum enum on enum.enumtypid = t.oid

   where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
         and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         and c.relkind in ('r', 'f', 'p')
         and
           (   t.typrelid = 0
            or
               (select c.relkind = 'c'
                 from pg_class c
                where c.oid = t.typrelid)
           )
           and not exists
             (
                select 1
                  from pg_type el
                 where el.oid = t.typelem
                   and el.typarray = t.oid
              )

group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;

marcelothomaz avatar May 19 '22 16:05 marcelothomaz

@dimitri Ran into this issue as well. Running the query you mentioned above I get:

Screenshot 2023-02-10 at 15 02 39

I have no idea what the purpose of the query is, but it appears the 3 values of the sentiment enum are each repeated 3 times.

If I inspect the datatype with \dT+ I only get 3 elements: Screenshot 2023-02-10 at 15 02 16

dbackeus avatar Feb 10 '23 14:02 dbackeus

@marcelothomaz 's change did not work for me to filter out the duplicate values, but using distinct did: https://github.com/ns-mkusper/pgloader/blob/master/src/pgsql/sql/list-all-sqltypes.sql#L8

Would we be interested in getting this change into the project or does this have some unwanted side effect that I'm missing?

ns-mkusper avatar May 10 '23 18:05 ns-mkusper

@marcelothomaz 's change did not work for me to filter out the duplicate values, but using distinct did: https://github.com/ns-mkusper/pgloader/blob/master/src/pgsql/sql/list-all-sqltypes.sql#L8

Would we be interested in getting this change into the project or does this have some unwanted side effect that I'm missing?

Hi @ns-mkusper, I can't remember why actually I didn't use distinct, but there was a reason, if my memory is not playing with me, I believe it was due to the order of enums or values attributed to each, I'm not sure to be honest. I don't have access to that database anymore to re-run my tryouts again and confirm, I'm sorry.

marcelothomaz avatar May 11 '23 20:05 marcelothomaz