airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Substream cursor

Open maxi297 opened this issue 2 years ago • 9 comments

What area the feature impact?

Connectors

Revelant Information

As requested in Slack: For example, the first time I get 3 ids from the API /v1/deals, I pass it to the API /v1/deals/{id}/flow, the second time I run the API /v1/deals, I get 2 new ids, then I pass it to the API /v1 /deals/{id}/flow. How to do this?

As of 2023-07-13, there are no ways to do this because it's a whole new way of managing the state (it's not incremental as "the ids are incremental" but it's incremental as "we have never fetched the information for those ids).

Proposed solution Have a new component (name to be reworked) to allow the cursor to manage a substream like this:

incremental_sync:
  type: SubstreamAlreadyFetchedCursor
  parent_stream: "#/definitions/parent_stream"
  parent_key: id

maxi297 avatar Jul 13 '23 13:07 maxi297

@maxi297 any updates here? This seems to be a blocker for me using airbyte for Chorus's api. I can incrementally pull conversations, but they don't include the transcript. So I need to use a stream partition to query another endpoint for each conversation. However, I haven't found a way to do that where it isn't a full refresh on the substream.

tjhiggins avatar Oct 20 '23 16:10 tjhiggins

@tjhiggins I see that some people have shown their interest for this issue. Let me bring this back to the team and see if it's enough to prioritize it.

In the meanwhile if this is blocking for you and you are up for the challenge, you could implement your own version of HttpStream that would:

  • have a parent stream field which would be a conversations stream
  • re-implement state getter and setter to forward this to the conversations stream
  • re-implement stream_slices to fetch the conversations stream records
  • re-implement path in order to consider the information of the slices

I'll keep you posted on this issue!

maxi297 avatar Oct 23 '23 12:10 maxi297

Grooming:

  • in terms of YAML manifest, we could avoid having a new type of cursor by having a field "forward_to_parent". The implementer can see if this makes sense

maxi297 avatar Oct 24 '23 16:10 maxi297

@tjhiggins This has been deemed not aligned with our team's current goals. We will re-evaluate before the next cycle which is around mid-November

maxi297 avatar Oct 24 '23 16:10 maxi297

@tjhiggins This has been deemed not aligned with our team's current goals. We will re-evaluate before the next cycle which is around mid-November

Thanks for the update.

tjhiggins avatar Oct 24 '23 16:10 tjhiggins

Another request for this feature: https://airbytehq-team.slack.com/archives/C027KKE4BCZ/p1705077322351399

lmossman avatar Jan 17 '24 22:01 lmossman

And another one as well from Slack: https://airbytehq.slack.com/archives/C027KKE4BCZ/p1715089626142729

I have one API that would align with this as well, since the child object is only changed when the parent is changed, so a feature like this it would prevent about 100K unneeded requests per run, which would also help with their strict API limits.

NAjustin avatar May 07 '24 14:05 NAjustin

Same here, Jiminny API. We have a parent stream activities which would take 17h+ for a full refresh with low chance of running through without timeouts. A child stream 'summary' is requesting the summary of one of this activites. While the parent stream is incremental the child stream tries to run the parent again for the whole year which is a) data we don't want to pull again b) doomed to fail as its too much for the api

TorstenFraust avatar Jun 11 '24 07:06 TorstenFraust

Any update on this?

mariana-s-fernandes avatar Aug 22 '24 16:08 mariana-s-fernandes

@mariana-s-fernandes @TorstenFraust @NAjustin This is now supported in the Connector Builder if the substream is also configured to be incremental: image

Can you confirm if this solves your use cases?

lmossman avatar Aug 28 '24 17:08 lmossman

@lmossman Unfortunately not as the substream has to be incremental. I don't undertand this limitation as I was expecting this feature to just pass the parrent id's from the current run to the child stream. My substream just accepts one imput the id of the parrent stream, so I can not make it incremental. https://arc.net/l/quote/dvqztooa

What it the behaviour if the substream is not incremental? Right now it looks to me like the substream runs before the parent stream.

Screenshot 2024-09-05 at 18 28 38

TorstenFraust avatar Sep 05 '24 16:09 TorstenFraust

@TorstenFraust have you found a solution or workaround, specifically related to Jiminny?

htkapiche avatar Dec 05 '24 23:12 htkapiche

Is there any update on the above? @lmossman it has not solved the usecase that @TorstenFraust mentioned. I'm having the exact same problem.

Regardless of whether that option is selected, the child still runs a full refresh every time.

@htkapiche were you able to solve this at all?

adeolaemmanuelmorren avatar Mar 07 '25 21:03 adeolaemmanuelmorren

I have raised the request to support this on non-incremental child streams to the team. I don't have a guarantee on when we will get to it but they are hoping to tackle it soon

lmossman avatar Mar 11 '25 17:03 lmossman

@lmossman are you able to share any update in regard to non-incremental child stream?

hai-ld avatar Sep 14 '25 14:09 hai-ld