wal2json icon indicating copy to clipboard operation
wal2json copied to clipboard

Getting an overwhelming amount of transactions with empty change sets despite using add-tables

Open bahaa opened this issue 5 years ago • 9 comments

We are using Replication Streaming and wal2json to replicate some tables from AWS RDS PostgreSQL 10.6 to other application. We selectively selecting some tables using add-tables parameter. Everything works great except that sometimes we receive huge amount of transactions with empty changeset that overwhelm the server and slow down syncing real changes we care about.

We don't know what causes these huge amount of empty changesets, but I can replicate getting a single empty changeset using either REFRESH MATERIALIZED VIEW or TRUNCATE TABLE on any table even it it's not specified on add-tables filter.

bahaa avatar Feb 14 '19 10:02 bahaa

I can confirm that we're also seeing empty change sets when using add-tables and filter-tables.

pluginOpts := "\"include-xids\" '1', \"include-timestamp\" '1', \"filter-tables\" '*.test'"
err = d.conn.StartReplication(d.config.SlotName, startLSN, -1, pluginOpts)

Inserting records into test triggers a WAL change message in my consumer with {"change": [], "timestamp": "...", "xid": 123}.

johanmickos avatar Feb 19 '19 17:02 johanmickos

Please provide a test case.

eulerto avatar Feb 19 '19 17:02 eulerto

I'll see if I can draft one up.

It seems like this is expected behavior, though. The callbacks for BEGIN and COMMIT write empty changesets regardless of the internal filtering happening in pg_decode_change.

johanmickos avatar Feb 19 '19 18:02 johanmickos

No, it doesn't.

euler=# SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');
 ?column? 
----------
 init
(1 registro)

euler=# begin;
BEGIN
euler=# commit;
COMMIT
euler=# SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', '1');
 data 
------
(0 registro)
euler=# SELECT 'stop' FROM pg_drop_replication_slot('test_slot');
 ?column? 
----------
 stop
(1 registro)

It seems you have some DDL such as:

euler=# SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');
 ?column? 
----------
 init
(1 registro)

euler=# begin;
BEGIN
euler=# create table foo (a int);
CREATE TABLE
euler=# commit;
COMMIT
euler=# SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL);
     data      
---------------
 {"change":[]}
(1 registro)

euler=# SELECT 'stop' FROM pg_drop_replication_slot('test_slot');
 ?column? 
----------
 stop
(1 registro)

eulerto avatar Feb 19 '19 18:02 eulerto

I meant that the following is expected behavior:

CREATE DATABASE tests;
\connect tests;
CREATE TABLE users(
    id integer primary key,
    name varchar
);
CREATE TABLE something(
    id integer primary key,
    name varchar
);
SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

INSERT INTO users(id, name) VALUES(1, 'John Doe');
INSERT INTO something(id, name) VALUES(99, 'something');

SELECT  data FROM pg_logical_slot_peek_changes('test_slot',  NULL, NULL, 'pretty-print', '1', 'add-tables', 'public.something');
                                   data
--------------------------------------------------------------------------
 {                                                                       +
         "change": [                                                     +
         ]                                                               +
 }
 {                                                                       +
         "change": [                                                     +
                 {                                                       +
                         "kind": "insert",                               +
                         "schema": "public",                             +
                         "table": "something",                           +
                         "columnnames": ["id", "name"],                  +
                         "columntypes": ["integer", "character varying"],+
                         "columnvalues": [99, "something"]               +
                 }                                                       +
         ]                                                               +
 }
(2 rows)

The first row of the peek changes command is an empty change set, which is the expected result.

pg_decode_begin_txn and pg_decode_commit_txn (i.e. the callbacks for beginning and committing transactions) are responsible for outputting {"change": []} regardless of the filters.

johanmickos avatar Feb 19 '19 18:02 johanmickos

So far, here are the cases that generate empty changesets:

  • TRUNCATE TABLE.
  • REFRESH MATERIALIZED VIEW.
  • DDL statements.

Yet, I don't know what causes huge amount of these empty changesets. The operations above generate only one or two events.

bahaa avatar Feb 20 '19 10:02 bahaa

Running VACUUM FULL generates a huge amount of empty change sets. This happens also with the test_decoding plugin, but the plugin has an option to exclude-empty-xacts that works just fine.

I think wal2json should have a similar option, or at least when using add-tables, the plugin should skip empty transactions since they are not related to the mentioned tables.

Here's a test case:

SELECT * FROM pg_create_logical_replication_slot('wal2json_slot', 'wal2json');

SELECT * FROM pg_logical_slot_get_changes('wal2json_slot', NULL, NULL, 'add-tables', '*.users');
lsn | xid | data
-----+-----+------
(0 rows)

VACUUM FULL;

SELECT * FROM pg_logical_slot_get_changes('wal2json_slot', NULL, NULL, 'add-tables', '*.users');
    lsn     |  xid  |     data
-------------+-------+---------------
 11/E81E3860 | 79420 | {"change":[]}
 11/E81F9AE0 | 79421 | {"change":[]}
 11/E82113D0 | 79422 | {"change":[]}
-- thousands of rows.

bahaa avatar Feb 26 '19 15:02 bahaa

Here is another case with table partitioning:

CREATE TABLE normal_table (id integer);
CREATE TABLE partitioned_table (id integer, type text, PRIMARY KEY (id,type)) PARTITION BY LIST(type);
CREATE TABLE partitioned_table_part PARTITION OF partitioned_table FOR VALUES IN ('test');

SELECT * FROM pg_create_logical_replication_slot('wal2json_slot', 'wal2json');
SELECT * FROM pg_logical_slot_get_changes('wal2json_slot', NULL, NULL, 'add-tables', '*.normal_table');
lsn | xid | data
-----+-----+------
(0 rows)
INSERT INTO partitioned_table VALUES (1,'test');
SELECT * FROM pg_logical_slot_get_changes('wal2json_slot', NULL, NULL, 'add-tables', '*.normal_table');
lsn | xid | data
-----+-----+------
0/503A7768	8396	{"change":[]}

joekohlsdorf avatar Aug 02 '19 16:08 joekohlsdorf

PostgreSQL 15 has a fix to this. If a transaction doesn't produce any records, then begin and commit messages will be skipped. See https://www.postgresql.fastware.com/blog/how-postgresql-15-improved-communication-in-logical-replication

holvianssi avatar Apr 25 '23 17:04 holvianssi