seatunnel
seatunnel copied to clipboard
[feature][CDC] The basic implementation of the CDC source Reader in the incremental phase
Search before asking
- [X] I had searched in the feature and found no similar feature requirement.
Description
This is a subtask of #3175 to track completion.
Incremental phase

When all snapshot splits report the water level, start the incremental phase.
Combine all snapshot splits and water level information to get LogSplits
We want to minimize the number of log connections:
- In the incremental phase, only one reader works by default, and the user can also configure the option to specify the number (cannot exceed the number of readers)
- A reader gets at most one connection
// pseudo-code.
public class LogSplit implements SourceSplit {
private final String splitId;
/**
* All the tables that this log split needs to capture.
*/
private final List<TableId> tableIds;
/**
* Minimum watermark for SnapshotSplits for all tables in this LogSplit
*/
private final Offset startingOffset;
/**
* Obtained by configuration, may not end
*/
private final Offset endingOffset;
/**
* SnapshotSplit information for all tables in this LogSplit.
* </br> Used to support Exactly-Once.
*/
private final List<CompletedSnapshotSplitInfo> completedSnapshotSplitInfos;
/**
* Maximum watermark in SnapshotSplits per table.
* </br> Used to delete information in completedSnapshotSplitInfos, reducing state size.
* </br> Used to support Exactly-Once.
*/
private final Map<TableId, Offset> tableWatermarks;
}
// pseudo-code.
public class CompletedSnapshotSplitInfo implements Serializable {
private final String splitId;
private final TableId tableId;
private final SeaTunnelRowType splitKeyType;
private final Object splitStart;
private final Object splitEnd;
private final Offset watermark;
}

Exactly-Once:
- phase 1: Use completedSnapshotSplitInfos filter before the watermark data.
- phase 2: A table no longer needs to be filtered, delete the data belonging to the table in completedSnapshotSplitInfos, because the following data needs to be processed.
At-Least-Once: Not filter data, and completedSnapshotSplitInfos doesn't need any data.
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct