pgcopydb icon indicating copy to clipboard operation
pgcopydb copied to clipboard

Implement filtering logic for logical messages

Open gokhangulbiz opened this issue 1 year ago • 2 comments

This pull request adds the implementation to filter out logical messages based on source filters for --follow operations.

It introduces the ShouldFilterOutMessage(..) function in ld_stream.c to determine whether a message needs to be filtered out based on the provided source filters. When a logical message is received, the logical message parsers (for both test_decoding and wal2json) can set the message metadata.filterout field by calling the introduced function to prevent further processing.

Fixes #510

gokhangulbiz avatar Jan 04 '24 12:01 gokhangulbiz

@gokhangulbiz Thanks for your contribution. Could you please rebase against latest main?

arajkumar avatar Jan 04 '24 13:01 arajkumar

@gokhangulbiz Is this PR on your radar still?

arajkumar avatar Mar 15 '24 04:03 arajkumar

Hi, I'm interested in knowing if this MR is going to see through? We want to run an online migration with CDC to copy over only a subset of tables from a database. Right now, after doing some testing, it seems that the SQL files generated from the WAL consumption contain statements that belong to tables excluded by the filter I'm using. Since these excluded tables won't be recreated on the target instance, replaying these SQL files will result in errors.

dcupif avatar Aug 13 '24 13:08 dcupif

@dcupif As an workaround, we use [exclude-table-data], which avoids copying initial data and avoids errors during CDC. Once the migration is over, the unwanted table can be dropped. This might be useful when the candidate table has a lots of exists data, but fewer DMLs during migration.

Of course, this won't be useful if the excluded table attracts lots of DML during CDC.

arajkumar avatar Aug 13 '24 13:08 arajkumar

Of course, this won't be useful if the excluded table attracts lots of DML during CDC.

Yes, that's my concern. We have mutualized Postgres servers (e.g. dozens of databases hosted on the same server) with a lot of activity. Not filtering the DML statements means that pgcopydb can store a lot of changes, which could ultimately incur storage issues. I believe we could use an infinite storage mount like S3 or something, but this seems extremely inefficient.

dcupif avatar Aug 13 '24 13:08 dcupif

Closing this PR as after discussions, if I remember correctly, the solution looks like adding support to the pgoutput logical decoding plugin where it is possible to handle the filtering as part as the CREATE PUBLICATION and then ALTER PUBLICATION ... ADD ... for each table.

dimitri avatar Sep 02 '24 09:09 dimitri