Substream cursor
What area the feature impact?
Connectors
Revelant Information
As requested in Slack: For example, the first time I get 3 ids from the API /v1/deals, I pass it to the API /v1/deals/{id}/flow, the second time I run the API /v1/deals, I get 2 new ids, then I pass it to the API /v1 /deals/{id}/flow. How to do this?
As of 2023-07-13, there are no ways to do this because it's a whole new way of managing the state (it's not incremental as "the ids are incremental" but it's incremental as "we have never fetched the information for those ids).
Proposed solution Have a new component (name to be reworked) to allow the cursor to manage a substream like this:
incremental_sync:
type: SubstreamAlreadyFetchedCursor
parent_stream: "#/definitions/parent_stream"
parent_key: id
@maxi297 any updates here? This seems to be a blocker for me using airbyte for Chorus's api. I can incrementally pull conversations, but they don't include the transcript. So I need to use a stream partition to query another endpoint for each conversation. However, I haven't found a way to do that where it isn't a full refresh on the substream.
@tjhiggins I see that some people have shown their interest for this issue. Let me bring this back to the team and see if it's enough to prioritize it.
In the meanwhile if this is blocking for you and you are up for the challenge, you could implement your own version of HttpStream that would:
- have a parent stream field which would be a conversations stream
- re-implement
stategetter and setter to forward this to the conversations stream - re-implement
stream_slicesto fetch the conversations stream records - re-implement
pathin order to consider the information of the slices
I'll keep you posted on this issue!
Grooming:
- in terms of YAML manifest, we could avoid having a new type of cursor by having a field "forward_to_parent". The implementer can see if this makes sense
@tjhiggins This has been deemed not aligned with our team's current goals. We will re-evaluate before the next cycle which is around mid-November
@tjhiggins This has been deemed not aligned with our team's current goals. We will re-evaluate before the next cycle which is around mid-November
Thanks for the update.
Another request for this feature: https://airbytehq-team.slack.com/archives/C027KKE4BCZ/p1705077322351399
And another one as well from Slack: https://airbytehq.slack.com/archives/C027KKE4BCZ/p1715089626142729
I have one API that would align with this as well, since the child object is only changed when the parent is changed, so a feature like this it would prevent about 100K unneeded requests per run, which would also help with their strict API limits.
Same here, Jiminny API.
We have a parent stream activities which would take 17h+ for a full refresh with low chance of running through without timeouts. A child stream 'summary' is requesting the summary of one of this activites. While the parent stream is incremental the child stream tries to run the parent again for the whole year which is
a) data we don't want to pull again
b) doomed to fail as its too much for the api
Any update on this?
@mariana-s-fernandes @TorstenFraust @NAjustin
This is now supported in the Connector Builder if the substream is also configured to be incremental:
Can you confirm if this solves your use cases?
@lmossman Unfortunately not as the substream has to be incremental. I don't undertand this limitation as I was expecting this feature to just pass the parrent id's from the current run to the child stream. My substream just accepts one imput the id of the parrent stream, so I can not make it incremental. https://arc.net/l/quote/dvqztooa
What it the behaviour if the substream is not incremental? Right now it looks to me like the substream runs before the parent stream.
@TorstenFraust have you found a solution or workaround, specifically related to Jiminny?
Is there any update on the above? @lmossman it has not solved the usecase that @TorstenFraust mentioned. I'm having the exact same problem.
Regardless of whether that option is selected, the child still runs a full refresh every time.
@htkapiche were you able to solve this at all?
I have raised the request to support this on non-incremental child streams to the team. I don't have a guarantee on when we will get to it but they are hoping to tackle it soon
@lmossman are you able to share any update in regard to non-incremental child stream?