risingwave
risingwave copied to clipboard
Postgres-cdc: the size of WAL for a replication slot accumulated for a long time
Describe the bug
A customer report that the replication slot on upstream PG gradually grow its size even though there are no activities to the source tables for a long time.
// pg_replication_slots
-[ RECORD 1 ]-------+-----------------------
slot_name | risingwave_slot
plugin | pgoutput
slot_type | logical
datoid | 16399
database | test_db
temporary | f
active | t
active_pid | 25233
xmin |
catalog_xmin | 789484419
restart_lsn | 62A/ABCB4068
confirmed_flush_lsn | 62A/F2E66B38
wal_status | extended
safe_wal_size |
two_phase | f
-[ RECORD 1 ]------+-------------
pg_current_wal_lsn | 62F/2C000248
SELECT slot_name, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS raw, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag, active FROM pg_replication_slots;
-[ RECORD 1 ]------+---------------
slot_name | risingwave_slot
raw | 20203225568
replicationslotlag | 19 GB
active | t
// pg_stat_replication
-[ RECORD 1 ]----+----------------------------------
pid | 25233
usesysid | 16446
usename | myuser
application_name | Debezium Streaming
client_addr | 0.0.0.0
client_hostname |
client_port | 42455
backend_start | 2024-05-07 14:42:41.812152+00
backend_xmin |
state | streaming
sent_lsn | 62F/2C000248
write_lsn | 62F/2C000248
flush_lsn | 62A/F2E66B38
replay_lsn | 62A/F2E66B38
write_lag | 00:00:05.10841
flush_lag | 00:05:09.198305
replay_lag | 00:05:09.198305
sync_priority | 0
sync_state | async
reply_time | 1999-12-21 01:31:29.294857+00
We can see that the difference between current_wal_lsn and the restart_lsn is about 19GB.
And the strange thing is that the emitted offset in the Debezium heartbeat message doesn't increase at all. We rely on the heartbeat message to acknowledge the source offset to the upstream PG to reclaim the WAL (https://github.com/risingwavelabs/risingwave/pull/16058).
2024-05-08 13:43:19.051 {"timestamp":"2024-05-08T05:43:19.051443187Z","level":"INFO","fields":{"message":"identity cache miss occurred; added new identity (took Ok(4.478µs))"},"target":"aws_smithy_runtime::client::identity::cache::lazy","spans":[{"name":"lazy_load_identity"},{"name":"lazy_load_identity"}]} 2024-05-08 13:43:19.113 {"timestamp":"2024-05-08T05:43:19.113843166Z","level":"INFO","fields":{"message":"identity cache miss occurred; added new identity (took Ok(62.534918ms))"},"target":"aws_smithy_runtime::client::identity::cache::lazy","spans":[{"name":"lazy_load_identity"}]} 2024-05-08 13:43:19.168 {"timestamp":"2024-05-08T05:43:19.168124103Z","level":"INFO","fields":{"message":"Engine#11: committed offset DebeziumOffset{isHeartbeat=true, sourcePartition={server=RW_CDC_11}, sourceOffset={transaction_id=null, lsn_proc=6781533580088, messageType=COMMIT, lsn_commit=6781533580088, lsn=6781533580088, txId=790011769, ts_usec=1715100747672533}}","thread":"Thread-55573","class":"com.risingwave.connector.source.core.JniDbzSourceHandler"},"target":"risingwave_connector_node"} 2024-05-08 13:48:20.102 {"timestamp":"2024-05-08T05:48:20.102139457Z","level":"INFO","fields":{"message":"Engine#11: committed offset DebeziumOffset{isHeartbeat=true, sourcePartition={server=RW_CDC_11}, sourceOffset={transaction_id=null, lsn_proc=6781533580088, messageType=COMMIT, lsn_commit=6781533580088, lsn=6781533580088, txId=790011769, ts_usec=1715100747672533}}","thread":"Thread-55875","class":"com.risingwave.connector.source.core.JniDbzSourceHandler"},"target":"risingwave_connector_node"}
p.s 6781533580088 => 62A/F2E66B38
The doc of Debezium has some guidelines related to WAL disk consumption. The following one is the most likely one:
The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.
But the customer has confirmed that there is only one database in their postgres instance, all tables are located in the same database, so there doesn't exist a high-traffic database.
Fortunately, when there are activities occurred to the source tables, the WAL can be reclaimed since we acknowledge newer offset to the PG.
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response
The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.
Isn't it already solved by the heartbeat message?
The PostgreSQL instance contains multiple databases and one of them is a high-traffic database. Debezium captures changes in another database that is low-traffic in comparison to the other database.
This sounds like the most likely cause. I am suspicious whether the customer mistake something.
I am trying to reproduce the issue, it seems AWS RDS would generate WALs behind the scene:
For users on AWS RDS with PostgreSQL, a situation similar to the high traffic/low traffic scenario can occur in an idle environment. AWS RDS causes writes to its own system tables to be invisible to clients on a frequent basis (5 minutes). Again, regularly emitting events solves the problem.
I have reproduced the issue and confirmed it is caused by AWS generating writes into its hidden database. More details can be found in this blog.
solution: user can configure the heartbeat.action.query via the with option
create source rds_pg with (
connector = 'postgres-cdc',
...
debezium.heartbeat.action.query = 'SELECT pg_logical_emit_message(false, ''heartbeat'', now()::varchar)'
);
@StrikeW according to this post, it looks like recent versions (v42.7.0+) of the PgJDBC driver may have resolved this issue (pgjdbc/pgjdbc#2941). Is it possible the version RW depends on needs to be updated?
@StrikeW according to this post, it looks like recent versions (v42.7.0+) of the PgJDBC driver may have resolved this issue (pgjdbc/pgjdbc#2941). Is it possible the version RW depends on needs to be updated?
Upgrade the JDBC driver may introduce other issues as pointed out in https://github.com/pgjdbc/pgjdbc/issues/3138, I suggest we should leverage the heartbeat.action.query provided by Debezium to periodically emit traffic to the captured database.
Ohhh good to know! @StrikeW is that something we should do in our own source configuration, or do you plan to do it globally across all PG CDC source connectors?
Ohhh good to know! @StrikeW is that something we should do in our own source configuration, or do you plan to do it globally across all PG CDC source connectors?
I think we can address the problem in the PG CDC connector internally (via the heartbeat.action.query mechanism), so that users don't need to worry about their disk space. And it is not harm to periodically emit a tiny message to the WAL. cc @hzxa21
Fantastic! Will patiently await a fix upstream. ❤️