pgcopydb v0.14: ERRORS with pgcopydb clone --follow --plugin test_decoding
Hi, great job!
But... I get ERRORS when testing pgcopydb clone --follow --plugin test_decoding By the way: pgcopydb clone --follow --plugin wal2json WORKS
Environment:
Source pg11 (database pgbench, used pgbench tool) Target pg15
$ pgcopydb --version
10:53:56 520220 INFO Running pgcopydb version 0.14 from "/usr/pgsql-15/bin/pgcopydb"
pgcopydb version 0.14
compiled with PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit
compatible with Postgres 11, 12, 13, 14, 15, and 16
pgcopydb compiled from source https://github.com/dimitri/pgcopydb/archive/refs/tags/v0.14.tar.gz
Executed commands:
$ export PGCOPYDB_SOURCE_PGURI="postgres://repuser@drtest-pg1/pgbench"
$ export PGCOPYDB_TARGET_PGURI="postgres://repuser@pg4-test/pgbench"
$ export PGCOPYDB_DIR=/transfer/pg11/pgbench
$ export PGCOPYDB_LOG_DIR=/transfer/pg11/logs
$ pgcopydb clone \
--follow \
--dir "$PGCOPYDB_DIR" \
--fail-fast \
--slot-name pgcopydb_pgbench_test_encoding \
--plugin test_decoding \
2>&1 | tee $PGCOPYDB_LOG_DIR/pgcopydb_clone_follow_test_decoding_pgbench_pks_and_replica.log
Log file:
2024-01-08 09:57:24 516359 INFO main.c:138 Running pgcopydb version 0.14 from "/usr/pgsql-15/bin/pgcopydb"
2024-01-08 09:57:24 516359 INFO cli_common.c:1239 [SOURCE] Copying database from "postgres://repuser@drtest-pg1/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
2024-01-08 09:57:24 516359 INFO cli_common.c:1240 [TARGET] Copying database into "postgres://repuser@pg4-test/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
2024-01-08 09:57:24 516359 INFO pgsql.c:3683 Created logical replication slot "pgcopydb_pgbench_test_encoding" with plugin "test_decoding" at D/A09F2748 and exported snapshot 00000008-00008862-1
2024-01-08 09:57:24 516359 INFO ld_stream.c:2544 create schema if not exists pgcopydb
2024-01-08 09:57:24 516359 INFO ld_stream.c:2544 drop table if exists pgcopydb.sentinel
2024-01-08 09:57:24 516359 WARN pgsql.c:826 NOTICE: table "sentinel" does not exist, skipping
2024-01-08 09:57:24 516359 INFO ld_stream.c:2544 create table pgcopydb.sentinel(startpos pg_lsn, endpos pg_lsn, apply bool, write_lsn pg_lsn, flush_lsn pg_lsn, replay_lsn pg_lsn)
2024-01-08 09:57:24 516359 INFO ld_stream.c:2453 Created logical replication origin "pgcopydb" at LSN D/A09F2748
2024-01-08 09:57:24 516363 INFO cli_clone_follow.c:530 STEP 1: fetch source database tables, indexes, and sequences
2024-01-08 09:57:24 516367 INFO ld_apply.c:355 Waiting until the pgcopydb sentinel apply is enabled
2024-01-08 09:57:24 516363 INFO copydb_schema.c:143 Fetched information for 1 extensions
2024-01-08 09:57:24 516365 INFO ld_stream.c:649 Resuming streaming at LSN D/A09F2748 from replication slot "pgcopydb_pgbench_test_encoding"
2024-01-08 09:57:24 516363 INFO copydb_schema.c:440 Fetched information for 4 tables, with an estimated total of 1020 tuples and 130 MB
2024-01-08 09:57:24 516363 INFO copydb_schema.c:466 Fetched information for 3 indexes
2024-01-08 09:57:24 516363 INFO sequences.c:41 Fetching information for 0 sequences
2024-01-08 09:57:24 516363 INFO cli_clone_follow.c:547 STEP 2: dump the source database schema (pre/post data)
2024-01-08 09:57:24 516363 INFO pgcmd.c:473 /usr/pgsql-15/bin/pg_dump -Fc --snapshot 00000008-00008862-1 --section pre-data --file /transfer/pg11/pgbench/schema/pre.dump 'postgres://repuser@drtest-pg1/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
2024-01-08 09:57:24 516365 INFO pgsql.c:4400 Reported write_lsn D/A09F2748, flush_lsn D/A09F2748, replay_lsn 0/0
2024-01-08 09:57:25 516363 INFO pgcmd.c:473 /usr/pgsql-15/bin/pg_dump -Fc --snapshot 00000008-00008862-1 --section post-data --file /transfer/pg11/pgbench/schema/post.dump 'postgres://repuser@drtest-pg1/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
2024-01-08 09:57:25 516363 INFO cli_clone_follow.c:559 STEP 3: restore the pre-data section to the target database
2024-01-08 09:57:25 516363 INFO pgcmd.c:901 /usr/pgsql-15/bin/pg_restore --dbname 'postgres://repuser@pg4-test/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --use-list /transfer/pg11/pgbench/schema/pre-filtered.list /transfer/pg11/pgbench/schema/pre.dump
2024-01-08 09:57:25 516363 INFO indexes.c:30 STEP 6: starting 4 CREATE INDEX processes
2024-01-08 09:57:25 516363 INFO indexes.c:31 STEP 7: constraints are built by the CREATE INDEX processes
2024-01-08 09:57:25 516376 INFO table-data.c:420 STEP 4: starting 4 table data COPY processes
2024-01-08 09:57:25 516363 INFO vacuum.c:32 STEP 8: starting 4 VACUUM processes
2024-01-08 09:57:25 516363 INFO sequences.c:104 STEP 9: reset sequences values
2024-01-08 09:57:25 516389 INFO blobs.c:189 STEP 5: starting 4 Large Objects workers
2024-01-08 09:57:25 516390 INFO sequences.c:172 Reset sequences values on the target database
2024-01-08 09:57:25 516389 INFO blobs.c:501 Added 0 large objects to the queue
2024-01-08 09:57:27 516363 INFO cli_clone_follow.c:581 STEP 10: restore the post-data section to the target database
2024-01-08 09:57:27 516363 INFO pgcmd.c:901 /usr/pgsql-15/bin/pg_restore --dbname 'postgres://repuser@pg4-test/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --use-list /transfer/pg11/pgbench/schema/post-filtered.list /transfer/pg11/pgbench/schema/post.dump
2024-01-08 09:57:27 516363 INFO cli_clone_follow.c:599 Updating the pgcopydb.sentinel to enable applying changes
OID | Schema | Name | copy duration | indexes | create index duration
------+--------+------------------+---------------+---------+----------------------
39023 | public | pgbench_accounts | 1s493 | 1 | 269ms
39017 | public | pgbench_history | 173ms | 0 | 0ms
39026 | public | pgbench_branches | 16ms | 1 | 24ms
39020 | public | pgbench_tellers | 37ms | 1 | 16ms
Step Connection Duration Concurrency
-------------------------------------------------- ---------- ---------- ------------
Dump Schema source 39ms 1
Catalog Queries (table ordering, filtering, etc) source 314ms 1
Prepare Schema target 43ms 1
COPY, INDEX, CONSTRAINTS, VACUUM (wall clock) both 1s907 4 + 8
COPY (cumulative) both 1s719 4
Large Objects (cumulative) both 156ms 4
CREATE INDEX, CONSTRAINTS (cumulative) target 309ms 4
Finalize Schema target 450ms 1
-------------------------------------------------- ---------- ---------- ------------
Total Wall Clock Duration both 2s783 4 + 8
-------------------------------------------------- ---------- ---------- ------------
2024-01-08 09:57:27 516367 INFO ld_apply.c:365 The pgcopydb sentinel has enabled applying changes
2024-01-08 09:57:27 516367 INFO ld_apply.c:1277 Setting up previous LSN from replication origin "pgcopydb" progress at D/A09F2748
2024-01-08 09:57:27 516367 INFO ld_apply.c:254 Catchup-up with changes from LSN D/A09F2748
2024-01-08 09:57:27 516367 INFO ld_apply.c:93 File "/transfer/pg11/pgbench/cdc/000000030000000D000000A0.sql" does not exists yet, exit
2024-01-08 09:57:28 516365 ERROR pgsql.c:1511 Postgres query was interrupted: update pgcopydb.sentinel set startpos = $2, write_lsn = $1, flush_lsn = $2 returning startpos, endpos, apply, write_lsn, flush_lsn, replay_lsn
2024-01-08 09:57:28 516365 ERROR pgsql.c:5336 Failed to fetch pgcopydb.sentinel current values
2024-01-08 09:57:28 516365 INFO pgsql.c:4400 Reported write_lsn D/A65E0FC8, flush_lsn D/A65E0FC8, replay_lsn 0/0
2024-01-08 09:57:28 516365 INFO ld_stream.c:522 Streamed up to write_lsn D/A65E0FC8, flush_lsn D/A65E0FC8, stopping: endpos is 0/0
2024-01-08 09:57:28 516365 INFO follow.c:674 Prefetch process has terminated
2024-01-08 09:57:28 516364 INFO follow.c:616 Subprocesses for prefetch, transform, and catchup have now all exited
2024-01-08 09:57:28 516364 INFO follow.c:426 Catching-up from existing on-disk files
2024-01-08 09:57:28 516428 INFO ld_transform.c:983 Transformed 2 JSON messages into SQL file "/transfer/pg11/pgbench/cdc/000000030000000D000000A0.sql"
2024-01-08 09:57:28 516428 INFO ld_transform.c:983 Transformed 1 JSON messages into SQL file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.sql"
2024-01-08 09:57:28 516430 INFO ld_apply.c:1277 Setting up previous LSN from replication origin "pgcopydb" progress at D/A09F2748
2024-01-08 09:57:28 516430 INFO ld_apply.c:254 Catchup-up with changes from LSN D/A09F2748
2024-01-08 09:57:28 516430 INFO ld_apply.c:540 Replaying changes from file "/transfer/pg11/pgbench/cdc/000000030000000D000000A0.sql"
2024-01-08 09:57:28 516430 INFO ld_apply.c:131 Apply reached D/A0A192A8 in "/transfer/pg11/pgbench/cdc/000000030000000D000000A0.sql"
2024-01-08 09:57:28 516430 INFO ld_apply.c:540 Replaying changes from file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.sql"
2024-01-08 09:57:28 516430 INFO ld_apply.c:131 Apply reached D/A65E0FC8 in "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.sql"
2024-01-08 09:57:28 516430 INFO ld_apply.c:149 Reached end of file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.sql" at D/A65E0FC8.
2024-01-08 09:57:29 516364 INFO follow.c:230 Current sentinel replay_lsn is D/A65E0FC8
2024-01-08 09:57:29 516364 INFO follow.c:369 Restarting logical decoding follower in replay mode
2024-01-08 09:57:29 516432 INFO ld_stream.c:2014 Resuming streaming from latest file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.json"
2024-01-08 09:57:29 516433 INFO ld_transform.c:983 Transformed 1 JSON messages into SQL file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.sql"
2024-01-08 09:57:29 516432 INFO ld_stream.c:694 Resuming streaming at LSN D/A65E0FC8 from first message with that LSN read in JSON file "/transfer/pg11/pgbench/cdc/000000030000000D000000A6.json", line 0
2024-01-08 09:57:29 516434 INFO ld_apply.c:1286 Setting up previous LSN from replication origin "pgcopydb" progress at D/A09F2748, overriding previous value D/A65E0FC8
2024-01-08 09:57:29 516434 INFO ld_apply.c:254 Replaying changes from LSN D/A09F2748
2024-01-08 09:57:29 516432 INFO pgsql.c:4400 Reported write_lsn D/A65E0FC8, flush_lsn D/A65E0FC8, replay_lsn D/A65E0FC8
2024-01-08 09:57:40 516432 INFO pgsql.c:4400 Reported write_lsn D/A65E17D0, flush_lsn D/A65E0FC8, replay_lsn D/A65E0FC8
2024-01-08 09:57:50 516432 INFO pgsql.c:4400 Reported write_lsn D/A65E1DA8, flush_lsn D/A65E13F8, replay_lsn D/A65E13F8
Compare data:
$ pgcopydb compare data --notice --dir "$PGCOPYDB_DIR"
10:01:30 516728 INFO Running pgcopydb version 0.14 from "/usr/pgsql-15/bin/pgcopydb"
10:01:30 516728 NOTICE Using work dir "/transfer/pg11/pgbench"
10:01:30 516728 NOTICE Work directory "/transfer/pg11/pgbench" already exists
10:01:30 516728 INFO A previous run has run through completion
10:01:30 516728 INFO SOURCE: Connecting to "postgres://repuser@drtest-pg1/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
10:01:30 516728 INFO Fetched information for 4 tables, with an estimated total of 1020 tuples and 130 MB
10:01:30 516728 NOTICE Skipping target catalog preparation
10:01:30 516728 INFO starting 4 table compare processes
10:01:30 516730 NOTICE Started table worker 516730 [516728]
10:01:30 516731 NOTICE Started table worker 516731 [516728]
10:01:30 516732 NOTICE Started table worker 516732 [516728]
10:01:30 516733 NOTICE Started table worker 516733 [516728]
10:01:30 516730 NOTICE public.pgbench_branches: 10 rows, checksum 7c520291-568e-c529-5e92-6aa7ff0de93
10:01:30 516733 NOTICE public.pgbench_history: 20000 rows, checksum 5229ca74-fa99-7982-0c41-45d50865d37
10:01:30 516731 NOTICE public.pgbench_tellers: 100 rows, checksum 05ce2173-65e2-6dd6-d7cc-c81d6e10bfb
10:01:30 516732 NOTICE public.pgbench_accounts: 1000000 rows, checksum ebf5d608-41d7-0821-f68e-85c41f201e8
10:01:30 516728 NOTICE Storing migration schema in JSON file "/transfer/pg11/pgbench/schema.json"
Table Name | ! | Source Checksum | Target Checksum
-------------------------------+---+--------------------------------------+-------------------------------------
public.pgbench_accounts | | ebf5d608-41d7-0821-f68e-85c41f201e8 | ebf5d608-41d7-0821-f68e-85c41f201e8
public.pgbench_history | | 5229ca74-fa99-7982-0c41-45d50865d37 | 5229ca74-fa99-7982-0c41-45d50865d37
public.pgbench_branches | | 7c520291-568e-c529-5e92-6aa7ff0de93 | 7c520291-568e-c529-5e92-6aa7ff0de93
public.pgbench_tellers | | 05ce2173-65e2-6dd6-d7cc-c81d6e10bfb | 05ce2173-65e2-6dd6-d7cc-c81d6e10bfb
Insert new data into pg11:
$ pgbench -c 1 -t 20000 --no-vacuum pgbench
Logs:
$ cat $PGCOPYDB_LOG_DIR/pgcopydb_clone_follow_test_decoding_pgbench_pks_and_replica.log
...
2024-01-08 10:07:52 516432 INFO pgsql.c:4400 Reported write_lsn D/A662A7E0, flush_lsn D/A66291C8, replay_lsn D/A6628BB8
2024-01-08 10:08:02 516432 INFO pgsql.c:4400 Reported write_lsn D/A662ADB8, flush_lsn D/A662A408, replay_lsn D/A662A408
2024-01-08 10:08:12 516433 ERROR ld_test_decoding.c:979 Failed to parse decoding message for UPDATE on table public.pgbench_accounts which is not in our catalogs
2024-01-08 10:08:12 516433 ERROR ld_test_decoding.c:501 Failed to parse UPDATE new-tuple columns for logical message table public.pgbench_accounts: UPDATE: aid[integer]:352867 bid[integer]:4 abalance[integer]:-3280 filler[character]:' '
2024-01-08 10:08:12 516433 ERROR ld_test_decoding.c:272 Failed to parse test_decoding UPDATE message: table public.pgbench_accounts: UPDATE: aid[integer]:352867 bid[integer]:4 abalance[integer]:-3280 filler[character]:' '
2024-01-08 10:08:12 516433 ERROR ld_transform.c:1236 Failed to parse test_decoding message, see above for details
2024-01-08 10:08:12 516433 ERROR ld_transform.c:472 Failed to parse JSON message: {"action":"U","xid":"0","lsn":"D/A662B408","timestamp":"2024-01-08 10:08:12.672143+0000","message":"table public.pgbench_accounts: UPDATE: aid[integer]:352867 bid[integer]:4 abalance[integer]:-3280 filler[character]:' '"}
2024-01-08 10:08:12 516433 ERROR ld_transform.c:100 Failed to transform JSON messages from input stream, see above for details
2024-01-08 10:08:12 516364 ERROR follow.c:964 Subprocess transform with pid 516433 has exited with error code 12
Compare data:
$ pgcopydb compare data --notice --dir "$PGCOPYDB_DIR"
10:12:04 517455 INFO Running pgcopydb version 0.14 from "/usr/pgsql-15/bin/pgcopydb"
10:12:04 517455 NOTICE Using work dir "/transfer/pg11/pgbench"
10:12:04 517455 NOTICE Work directory "/transfer/pg11/pgbench" already exists
10:12:04 517455 INFO A previous run has run through completion
10:12:04 517455 INFO SOURCE: Connecting to "postgres://repuser@drtest-pg1/pgbench?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
10:12:04 517455 INFO Fetched information for 4 tables, with an estimated total of 1040 tuples and 132 MB
10:12:04 517455 NOTICE Skipping target catalog preparation
10:12:04 517455 INFO starting 4 table compare processes
10:12:04 517457 NOTICE Started table worker 517457 [517455]
10:12:04 517458 NOTICE Started table worker 517458 [517455]
10:12:04 517459 NOTICE Started table worker 517459 [517455]
10:12:04 517460 NOTICE Started table worker 517460 [517455]
10:12:04 517459 ERROR Table public.pgbench_tellers has checksum a07af71f-0ed1-fdee-17a6-fdfb1134ca7 on source, 05ce2173-65e2-6dd6-d7cc-c81d6e10bfb on target
10:12:04 517459 NOTICE public.pgbench_tellers: 100 rows, checksum a07af71f-0ed1-fdee-17a6-fdfb1134ca7
10:12:04 517457 ERROR Table public.pgbench_history has 40000 rows on source, 20000 rows on target
10:12:04 517457 NOTICE public.pgbench_history: 40000 rows, checksum a7b60678-cd1b-1000-e4f3-4509feb7f6b
10:12:04 517458 ERROR Table public.pgbench_branches has checksum 9ef9752b-5fef-195c-9546-9a18d8c3e72 on source, 7c520291-568e-c529-5e92-6aa7ff0de93 on target
10:12:04 517458 NOTICE public.pgbench_branches: 10 rows, checksum 9ef9752b-5fef-195c-9546-9a18d8c3e72
10:12:05 517460 ERROR Table public.pgbench_accounts has checksum 19d9562c-545e-1a99-c7df-d718361eee8 on source, ebf5d608-41d7-0821-f68e-85c41f201e8 on target
10:12:05 517460 NOTICE public.pgbench_accounts: 1000000 rows, checksum 19d9562c-545e-1a99-c7df-d718361eee8
10:12:05 517455 NOTICE Storing migration schema in JSON file "/transfer/pg11/pgbench/schema.json"
Table Name | ! | Source Checksum | Target Checksum
-------------------------------+---+--------------------------------------+-------------------------------------
public.pgbench_accounts | ! | 19d9562c-545e-1a99-c7df-d718361eee8 | ebf5d608-41d7-0821-f68e-85c41f201e8
public.pgbench_history | ! | a7b60678-cd1b-1000-e4f3-4509feb7f6b | 5229ca74-fa99-7982-0c41-45d50865d37
public.pgbench_branches | ! | 9ef9752b-5fef-195c-9546-9a18d8c3e72 | 7c520291-568e-c529-5e92-6aa7ff0de93
public.pgbench_tellers | ! | a07af71f-0ed1-fdee-17a6-fdfb1134ca7 | 05ce2173-65e2-6dd6-d7cc-c81d6e10bfb
Hi @branecc ; what happens is that the wal2json output plugin includes enough information (pkey) with each message for the client (pgcopydb here) to re-build the SQL command whereas test_decoding does not, so the client program has to maintain that information itself.
Please try to build from current sources, where I believe thanks to maintaining a local cache of the source table catalogs on-disk (using SQLite), we should not see that problem anymore.
Advice please!
Which output plugin is recommended: test_decoding or wal2json?
I am trying to migrate databases from pg11 to pg15. I tried native logical replication and had problems with errors like "ERROR: all replication slots are in use" HINT: Free one or increase max_replication_slots Increasing max_replication_slots requires a server restart. I can't afford that. Especially since you cannot know how many replication slots postgres will create. A new replication slot is created for each table. We have parallelism turned on. So we have one main and N temporary replication slots. It seems to me that postgres is not releasing replication slots correctly. As far as I could find on the internet, others had similar errors. The bug seems to be fixed with postgres version 14 (not tested), which is of no use to me.
I will try with pgcopydb. My database migration solution with native logical replication involves a lot of manual work. pgcopydb is much more sophisticated with good documentation.
So: test_decoding or wal2json?
At the moment unfortunately neither test_decoding nor wal2json has the wanted feature where “it just works”. Consider building wal2json from sources to benefit from https://github.com/eulerto/wal2json/pull/255 and then use pgcopydb clone --follow --plugin wal2json --wal2json-numeric-as-string ... to avoid pitfalls with numbers representation in JSON. I believe that's the best way to do it at this time.