airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[airbyte-cdk] 🐛 Allow usage of GlobalStateCursor for RFR substreams that support incremental_dependency

Open brianjlai opened this issue 4 months ago • 5 comments

https://github.com/airbytehq/oncall/issues/6464

What

For the affected connection, we were failing due to a heartbeat. However, after investigating, my hypothesis is that the combination of a very large number of parent records, very long gaps in between parent records that have children, and the increased size of the state slowing down the sync has stopped the sync from being able to make progress.

How

We can unblock certain types of RFR substreams that have a high volume of parent records if the parent is incremental. This is because an incremental parent with records that are updated due to changes in the child can use a global state cursor instead per-parent partition success tracking. That will significantly reduce the size of the state message which will get infinitely bigger and allow the sync to progress before the heartbeat times out.

For this custom API endpoint we can't control large gaps between parents since it is API dependent.

Review guide

  1. model_to_component_factory.py
  2. substream_partition_router.py

User Impact

Should be none. incremental_dependency doesn't have any usage in our repos last I checked

Can this PR be safely reverted and rolled back?

  • [x] YES 💚
  • [ ] NO ❌

brianjlai avatar Oct 02 '24 08:10 brianjlai