pgloader
pgloader copied to clipboard
pgloader initiates command to create duplicate value on enum type
I was trying to migrate database using command:
./build/bin/pgloader --debug pgsql://user:[email protected]:1921/mydb pgsql://user:[email protected]:1922/dbtest
but the process stopped with error:
...
2020-08-07T14:03:25.586000+08:00 DEBUG SET client_encoding TO 'utf8'
2020-08-07T14:03:26.068000+08:00 DEBUG SET application_name TO 'pgloader'
2020-08-07T14:03:26.871000+08:00 DEBUG BEGIN
2020-08-07T14:03:29.237000+08:00 SQL DROP TYPE IF EXISTS "public"."resource_type_enum" CASCADE;
2020-08-07T14:03:30.640000+08:00 SQL CREATE TYPE "public"."resource_type_enum" AS ENUM ('Unknown-Resource-Type', 'Unknown-Resource-Type', 'Dedicated-Single-Minimum', 'Dedicated-Single-Minimum', 'Dedicated-Single-Mini', 'Dedicated-Single-Mini', 'Dedicated-Single-Small', 'Dedicated-Single-Small', 'Dedicated-Single-Medium', 'Dedicated-Single-Medium', 'Dedicated-Single-Large', 'Dedicated-Single-Large', 'Dedicated-HA-Minimum', 'Dedicated-HA-Minimum', 'Dedicated-HA-Small', 'Dedicated-HA-Small', 'Dedicated-HA-Medium', 'Dedicated-HA-Medium', 'Dedicated-HA-Large', 'Dedicated-HA-Large');
2020-08-07T14:03:30.842000+08:00 ERROR Database error 23505: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(17012, Unknown-Resource-Type) already exists.
QUERY: CREATE TYPE "public"."resource_type_enum" AS ENUM ('Unknown-Resource-Type', 'Unknown-Resource-Type', 'Dedicated-Single-Minimum', 'Dedicated-Single-Minimum', 'Dedicated-Single-Mini', 'Dedicated-Single-Mini', 'Dedicated-Single-Small', 'Dedicated-Single-Small', 'Dedicated-Single-Medium', 'Dedicated-Single-Medium', 'Dedicated-Single-Large', 'Dedicated-Single-Large', 'Dedicated-HA-Minimum', 'Dedicated-HA-Minimum', 'Dedicated-HA-Small', 'Dedicated-HA-Small', 'Dedicated-HA-Medium', 'Dedicated-HA-Medium', 'Dedicated-HA-Large', 'Dedicated-HA-Large');
2020-08-07T14:03:31.019000+08:00 FATAL Failed to create the schema, see above.
2020-08-07T14:03:31.020000+08:00 LOG report summary reset
table name errors read imported bytes total time read write
----------------- --------- --------- --------- --------- -------------- --------- ---------
fetch meta data 0 19 19 0.299s
Create Schemas 0 0 0 0.476s
Create SQL Types 0 0 0 0.000s
----------------- --------- --------- --------- --------- -------------- --------- ---------
----------------- --------- --------- --------- --------- -------------- --------- ---------
2020-08-07T14:03:31.060000+08:00 INFO Stopping monitor
Seems pgloader initiates command to create enum type with duplicate values for each enum element.
I have checked the source database and confirmed that related enum type doesn't have duplicate values.
- [X] pgloader --version
pgloader version 3.6.e388909
compiled with SBCL 1.4.5.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
- [X] did you test a fresh compile from the source tree?
binary compiled from master branch
Hi @frbimo-adv ; thanks for the bug report. Can you run the following SQL query on your source Postgres database and paste the result of it in this issue, please:
--
-- get user defined SQL types
--
select nt.nspname,
extname,
typname,
case when enum.enumtypid is not null
then array_agg(enum.enumlabel order by enumsortorder)
end as enumvalues
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
join pg_type t on t.oid = a.atttypid
left join pg_namespace nt on nt.oid = t.typnamespace
left join pg_depend d on d.classid = 'pg_type'::regclass
and d.refclassid = 'pg_extension'::regclass
and d.objid = t.oid
left join pg_extension e on refobjid = e.oid
left join pg_enum enum on enum.enumtypid = t.oid
where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
and c.relkind in ('r', 'f', 'p')
and
( t.typrelid = 0
or
(select c.relkind = 'c'
from pg_class c
where c.oid = t.typrelid)
)
and not exists
(
select 1
from pg_type el
where el.oid = t.typelem
and el.typarray = t.oid
)
group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;
I have this same issue, running:
pgloader --version
pgloader version "3.6.2"
compiled with SBCL 2.0.11
I looked at the schema dump from pg_dump, and the enum is not duplicated, as far as I can tell:
--
-- Name: crop_classification; Type: TYPE; Schema: grain; Owner: type_editor
--
CREATE TYPE grain.crop_classification AS ENUM (
'6RSF(H)',
'DURUM',
'SRS',
'HRS',
'HWS',
'HRW',
'TRITICALE',
'SWW',
'SWS',
'2R2M',
'2RSM',
'2RSF',
'2RSF(H)',
'6RSF',
'6RSM',
'6RSN',
'HWW'
);
However, pgloader gives a similar error as the OP:
2021-03-18T02:05:59.888436Z NOTICE Prepare PostgreSQL database.
2021-03-18T02:06:02.415649Z ERROR Database error 23505: duplicate key value violates unique constraint "pg_enum_typid_label_index"
DETAIL: Key (enumtypid, enumlabel)=(320944, 6RSF(H)) already exists.
QUERY: CREATE TYPE "grain"."crop_classification" AS ENUM ('6RSF(H)', '6RSF(H)', 'DURUM', 'DURUM', 'SRS', 'SRS', 'HRS', 'HRS', 'HWS', 'HWS', 'HRW', 'HRW', 'TRITICALE', 'TRITICALE', 'SWW', 'SWW', 'SWS', 'SWS', '2R2M', '2R2M', '2RSM', '2RSM', '2RSF', '2RSF', '2RSF(H)', '2RSF(H)', '6RSF', '6RSF', '6RSM', '6RSM', '6RSN', '6RSN', 'HWW', 'HWW');
2021-03-18T02:06:02.415800Z FATAL Failed to create the schema, see above.
2021-03-18T02:06:02.415848Z LOG report summary reset
table name errors read imported bytes total time read write
----------------- --------- --------- --------- --------- -------------- --------- ---------
before load 0 1 1 0.020s
fetch meta data 0 176 176 0.745s
Create Schemas 0 0 0 0.004s
Create SQL Types 0 0 0 0.000s
----------------- --------- --------- --------- --------- -------------- --------- ---------
----------------- --------- --------- --------- --------- -------------- --------- ---------
Here's the output of running the query suggested above:
nspname | extname | typname | enumvalues
---------+---------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
grain | | crop_classification | {6RSF(H),6RSF(H),DURUM,DURUM,SRS,SRS,HRS,HRS,HWS,HWS,HRW,HRW,TRITICALE,TRITICALE,SWW,SWW,SWS,SWS,2R2M,2R2M,2RSM,2RSM,2RSF,2RSF,2RSF(H),2RSF(H),6RSF,6RSF,6RSM,6RSM,6RSN,6RSN,HWW,HWW}
grain | | crop_sub_type | {COMMON,BARLEY,DURUM,TRITICALE,SPRINGWHEAT,WINTERWHEAT,OAT}
grain | | prism_quality | {stable,provisional,early,forecast}
grain | | region | {IR,IR,NonIR,NonIR}
grain | | release_status | {Released,Advanced,Available}
grain | | trial_group | {BARLEY,BARLEY,WHEAT,WHEAT,DURUM,DURUM,COMMON,COMMON,OAT,OAT}
I notice the duplication problem seems to be repeated for other enums, but not all. Thoughts?
Hi,
I solved this by removing order by enumsortorder
from the array_agg function.
Might be better to review the joins and find where it is duplicating, however I stopped researching since this solved the issue for what I needed.
Full code:
--
-- get user defined SQL types
--
select nt.nspname,
extname,
typname,
case when enum.enumtypid is not null
then array_agg(enum.enumlabel)
end as enumvalues
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
join pg_type t on t.oid = a.atttypid
left join pg_namespace nt on nt.oid = t.typnamespace
left join pg_depend d on d.classid = 'pg_type'::regclass
and d.refclassid = 'pg_extension'::regclass
and d.objid = t.oid
left join pg_extension e on refobjid = e.oid
left join pg_enum enum on enum.enumtypid = t.oid
where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
and c.relkind in ('r', 'f', 'p')
and
( t.typrelid = 0
or
(select c.relkind = 'c'
from pg_class c
where c.oid = t.typrelid)
)
and not exists
(
select 1
from pg_type el
where el.oid = t.typelem
and el.typarray = t.oid
)
group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;
@dimitri Ran into this issue as well. Running the query you mentioned above I get:
![Screenshot 2023-02-10 at 15 02 39](https://user-images.githubusercontent.com/3461/218110804-3a36ac7f-5964-4a18-90c8-1dc6c92ce3e4.png)
I have no idea what the purpose of the query is, but it appears the 3 values of the sentiment
enum are each repeated 3 times.
If I inspect the datatype with \dT+
I only get 3 elements:
@marcelothomaz 's change did not work for me to filter out the duplicate values, but using distinct did: https://github.com/ns-mkusper/pgloader/blob/master/src/pgsql/sql/list-all-sqltypes.sql#L8
Would we be interested in getting this change into the project or does this have some unwanted side effect that I'm missing?
@marcelothomaz 's change did not work for me to filter out the duplicate values, but using distinct did: https://github.com/ns-mkusper/pgloader/blob/master/src/pgsql/sql/list-all-sqltypes.sql#L8
Would we be interested in getting this change into the project or does this have some unwanted side effect that I'm missing?
Hi @ns-mkusper, I can't remember why actually I didn't use distinct, but there was a reason, if my memory is not playing with me, I believe it was due to the order of enums or values attributed to each, I'm not sure to be honest. I don't have access to that database anymore to re-run my tryouts again and confirm, I'm sorry.