pgcopydb
pgcopydb copied to clipboard
Implement filtering logic for logical messages
This pull request adds the implementation to filter out logical messages based on source filters for --follow
operations.
It introduces the ShouldFilterOutMessage(..)
function in ld_stream.c
to determine whether a message needs to be filtered out based on the provided source filters. When a logical message is received, the logical message parsers (for both test_decoding and wal2json) can set the message metadata.filterout
field by calling the introduced function to prevent further processing.
Fixes #510
@gokhangulbiz Thanks for your contribution. Could you please rebase against latest main?
@gokhangulbiz Is this PR on your radar still?
Hi, I'm interested in knowing if this MR is going to see through? We want to run an online migration with CDC to copy over only a subset of tables from a database. Right now, after doing some testing, it seems that the SQL files generated from the WAL consumption contain statements that belong to tables excluded by the filter I'm using. Since these excluded tables won't be recreated on the target instance, replaying these SQL files will result in errors.
@dcupif As an workaround, we use [exclude-table-data], which avoids copying initial data and avoids errors during CDC. Once the migration is over, the unwanted table can be dropped. This might be useful when the candidate table has a lots of exists data, but fewer DMLs during migration.
Of course, this won't be useful if the excluded table attracts lots of DML during CDC.
Of course, this won't be useful if the excluded table attracts lots of DML during CDC.
Yes, that's my concern. We have mutualized Postgres servers (e.g. dozens of databases hosted on the same server) with a lot of activity. Not filtering the DML statements means that pgcopydb can store a lot of changes, which could ultimately incur storage issues. I believe we could use an infinite storage mount like S3 or something, but this seems extremely inefficient.
Closing this PR as after discussions, if I remember correctly, the solution looks like adding support to the pgoutput logical decoding plugin where it is possible to handle the filtering as part as the CREATE PUBLICATION and then ALTER PUBLICATION ... ADD ... for each table.