risingwave Postgres-cdc: the size of WAL for a replication slot accumulated for a long time

trafficstars

Describe the bug

A customer report that the replication slot on upstream PG gradually grow its size even though there are no activities to the source tables for a long time.

// pg_replication_slots
-[ RECORD 1 ]-------+-----------------------
slot_name           | risingwave_slot
plugin              | pgoutput
slot_type           | logical
datoid              | 16399
database            | test_db
temporary           | f
active              | t
active_pid          | 25233
xmin                | 
catalog_xmin        | 789484419
restart_lsn         | 62A/ABCB4068
confirmed_flush_lsn | 62A/F2E66B38
wal_status          | extended
safe_wal_size       | 
two_phase           | f

-[ RECORD 1 ]------+-------------
pg_current_wal_lsn | 62F/2C000248

SELECT slot_name, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS raw, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag, active FROM pg_replication_slots;
-[ RECORD 1 ]------+---------------
slot_name          | risingwave_slot
raw                | 20203225568
replicationslotlag | 19 GB
active             | t

// pg_stat_replication
-[ RECORD 1 ]----+----------------------------------
pid              | 25233
usesysid         | 16446
usename          | myuser
application_name | Debezium Streaming
client_addr      | 0.0.0.0
client_hostname  | 
client_port      | 42455
backend_start    | 2024-05-07 14:42:41.812152+00
backend_xmin     | 
state            | streaming
sent_lsn         | 62F/2C000248
write_lsn        | 62F/2C000248
flush_lsn        | 62A/F2E66B38
replay_lsn       | 62A/F2E66B38
write_lag        | 00:00:05.10841
flush_lag        | 00:05:09.198305
replay_lag       | 00:05:09.198305
sync_priority    | 0
sync_state       | async
reply_time       | 1999-12-21 01:31:29.294857+00

We can see that the difference between current_wal_lsn and the restart_lsn is about 19GB. And the strange thing is that the emitted offset in the Debezium heartbeat message doesn't increase at all. We rely on the heartbeat message to acknowledge the source offset to the upstream PG to reclaim the WAL (https://github.com/risingwavelabs/risingwave/pull/16058).

2024-05-08 13:43:19.051 {"timestamp":"2024-05-08T05:43:19.051443187Z","level":"INFO","fields":{"message":"identity cache miss occurred; added new identity (took Ok(4.478µs))"},"target":"aws_smithy_runtime::client::identity::cache::lazy","spans":[{"name":"lazy_load_identity"},{"name":"lazy_load_identity"}]} 2024-05-08 13:43:19.113 {"timestamp":"2024-05-08T05:43:19.113843166Z","level":"INFO","fields":{"message":"identity cache miss occurred; added new identity (took Ok(62.534918ms))"},"target":"aws_smithy_runtime::client::identity::cache::lazy","spans":[{"name":"lazy_load_identity"}]} 2024-05-08 13:43:19.168 {"timestamp":"2024-05-08T05:43:19.168124103Z","level":"INFO","fields":{"message":"Engine#11: committed offset DebeziumOffset{isHeartbeat=true, sourcePartition={server=RW_CDC_11}, sourceOffset={transaction_id=null, lsn_proc=6781533580088, messageType=COMMIT, lsn_commit=6781533580088, lsn=6781533580088, txId=790011769, ts_usec=1715100747672533}}","thread":"Thread-55573","class":"com.risingwave.connector.source.core.JniDbzSourceHandler"},"target":"risingwave_connector_node"} 2024-05-08 13:48:20.102 {"timestamp":"2024-05-08T05:48:20.102139457Z","level":"INFO","fields":{"message":"Engine#11: committed offset DebeziumOffset{isHeartbeat=true, sourcePartition={server=RW_CDC_11}, sourceOffset={transaction_id=null, lsn_proc=6781533580088, messageType=COMMIT, lsn_commit=6781533580088, lsn=6781533580088, txId=790011769, ts_usec=1715100747672533}}","thread":"Thread-55875","class":"com.risingwave.connector.source.core.JniDbzSourceHandler"},"target":"risingwave_connector_node"}

p.s 6781533580088 => 62A/F2E66B38

The doc of Debezium has some guidelines related to WAL disk consumption. The following one is the most likely one:

The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.

But the customer has confirmed that there is only one database in their postgres instance, all tables are located in the same database, so there doesn't exist a high-traffic database.

Fortunately, when there are activities occurred to the source tables, the WAL can be reclaimed since we acknowledge newer offset to the PG.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

May 10 '24 11:05 StrikeW

The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.

Isn't it already solved by the heartbeat message?

May 10 '24 16:05 yuhao-su

The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.

This sounds like the most likely cause. I am suspicious whether the customer mistake something.

May 13 '24 03:05 fuyufjh

I am trying to reproduce the issue, it seems AWS RDS would generate WALs behind the scene:

For users on AWS RDS with PostgreSQL, a situation similar to the high traffic/low traffic scenario can occur in an idle environment. AWS RDS causes writes to its own system tables to be invisible to clients on a frequent basis (5 minutes). Again, regularly emitting events solves the problem.

May 13 '24 04:05 StrikeW

I have reproduced the issue and confirmed it is caused by AWS generating writes into its hidden database. More details can be found in this blog.

solution: user can configure the heartbeat.action.query via the with option

create source rds_pg with (
 connector = 'postgres-cdc',
 ...
 debezium.heartbeat.action.query = 'SELECT pg_logical_emit_message(false, ''heartbeat'', now()::varchar)'
);

May 13 '24 09:05 StrikeW

@StrikeW according to this post, it looks like recent versions (v42.7.0+) of the PgJDBC driver may have resolved this issue (pgjdbc/pgjdbc#2941). Is it possible the version RW depends on needs to be updated?

May 13 '24 13:05 agrobbin

@StrikeW according to this post, it looks like recent versions (v42.7.0+) of the PgJDBC driver may have resolved this issue (pgjdbc/pgjdbc#2941). Is it possible the version RW depends on needs to be updated?

Upgrade the JDBC driver may introduce other issues as pointed out in https://github.com/pgjdbc/pgjdbc/issues/3138, I suggest we should leverage the heartbeat.action.query provided by Debezium to periodically emit traffic to the captured database.

May 13 '24 13:05 StrikeW

Ohhh good to know! @StrikeW is that something we should do in our own source configuration, or do you plan to do it globally across all PG CDC source connectors?

May 13 '24 14:05 agrobbin

Ohhh good to know! @StrikeW is that something we should do in our own source configuration, or do you plan to do it globally across all PG CDC source connectors?

I think we can address the problem in the PG CDC connector internally (via the heartbeat.action.query mechanism), so that users don't need to worry about their disk space. And it is not harm to periodically emit a tiny message to the WAL. cc @hzxa21

May 13 '24 15:05 StrikeW

Fantastic! Will patiently await a fix upstream. ❤️

May 13 '24 15:05 agrobbin

risingwave risingwave copied to clipboard

Postgres-cdc: the size of WAL for a replication slot accumulated for a long time

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

risingwave
risingwave copied to clipboard