airbyte
airbyte copied to clipboard
:sparkles: low-code: Add Incremental Parent State Handling to SubstreamPartitionRouter
What
This PR resolves the issue with incremental substream reads in the low-code. It ensures that the state is passed to the parent stream when generating partitions to avoid reading the parent stream in full refresh mode. Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/7369
Connector's changes:
How
The code changes introduce a new field incremental_dependency
in the ParentStreamConfig
class. The SubstreamPartitionRouter
was updated to read parent state incrementally and save the parent state when the stream slice is processed. Additionally, new default methods were added to StreamSlicer for cases when a stream slicer does not have parent streams.
Review guide
User Impact
- The end result is an optimized incremental substream reading process, which improves sync performance by reducing the need to read the parent stream in full refresh mode.
- There are no negative side effects anticipated.
Can this PR be safely reverted and rolled back?
- [ ] YES 💚
- [x] NO ❌
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
airbyte-docs | ⬜️ Ignored (Inspect) | Visit Preview | Jun 13, 2024 3:39pm |
Please split CDK and connectors changes into different PRs
@artem1205 I will create separate PRs for connectors; I just added these changes here for now for easier testing. This way we can execute regression tests from this branch.
@brianjlai, I have linked the connectors' PRs in the description. I added these changes to this PR so it can be easily tested using the regression tool from this branch. I will delete these changes before merging. However, regarding the connectors PR, it would be better to use a global parent state for them as it will help to avoid breaking changes and concerns about the new state size.
@tolik0, let me know when you separate out connector changes and rebase — I'm happy to get this merged this week if @artem1205 is still away, I have the gawd-tier merge button privileges.
@natikgadzhi @brianjlai I've attached an example of the regression test results. The target version shows more records due to the addition of lookback windows, yet it reproduces all records from the control version.
Given that state compatibility affects the regression tests for the connectors in this PR, the best way to validate these changes is through manual verification. Specifically, ensure the parent stream is using the correct cursor, as demonstrated in the new test I've added.
Regarding Artem's request, he only asked for the test that I have already added.