airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

:sparkles: low-code: Add Incremental Parent State Handling to SubstreamPartitionRouter

Open tolik0 opened this issue 9 months ago • 3 comments

What

This PR resolves the issue with incremental substream reads in the low-code. It ensures that the state is passed to the parent stream when generating partitions to avoid reading the parent stream in full refresh mode. Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/7369

Connector's changes:

How

The code changes introduce a new field incremental_dependency in the ParentStreamConfig class. The SubstreamPartitionRouter was updated to read parent state incrementally and save the parent state when the stream slice is processed. Additionally, new default methods were added to StreamSlicer for cases when a stream slicer does not have parent streams.

Review guide

User Impact

  • The end result is an optimized incremental substream reading process, which improves sync performance by reducing the need to read the parent stream in full refresh mode.
  • There are no negative side effects anticipated.

Can this PR be safely reverted and rolled back?

  • [ ] YES 💚
  • [x] NO ❌

tolik0 avatar May 15 '24 12:05 tolik0

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jun 13, 2024 3:39pm

vercel[bot] avatar May 15 '24 12:05 vercel[bot]

Please split CDK and connectors changes into different PRs

artem1205 avatar May 23 '24 13:05 artem1205

@artem1205 I will create separate PRs for connectors; I just added these changes here for now for easier testing. This way we can execute regression tests from this branch.

tolik0 avatar May 23 '24 13:05 tolik0

@brianjlai, I have linked the connectors' PRs in the description. I added these changes to this PR so it can be easily tested using the regression tool from this branch. I will delete these changes before merging. However, regarding the connectors PR, it would be better to use a global parent state for them as it will help to avoid breaking changes and concerns about the new state size.

tolik0 avatar Jun 11 '24 11:06 tolik0

@tolik0, let me know when you separate out connector changes and rebase — I'm happy to get this merged this week if @artem1205 is still away, I have the gawd-tier merge button privileges.

natikgadzhi avatar Jun 12 '24 22:06 natikgadzhi

@natikgadzhi @brianjlai I've attached an example of the regression test results. The target version shows more records due to the addition of lookback windows, yet it reproduces all records from the control version.

Given that state compatibility affects the regression tests for the connectors in this PR, the best way to validate these changes is through manual verification. Specifically, ensure the parent stream is using the correct cursor, as demonstrated in the new test I've added.

Regarding Artem's request, he only asked for the test that I have already added.

Screenshot from 2024-06-13 13-20-18 Screenshot from 2024-06-13 13-20-35

tolik0 avatar Jun 13 '24 10:06 tolik0