airbyte Add an "_AIRBYTE_SYNC_ID" column to track sync job and rows

Tell us about the problem you're trying to solve

I'd like to keep track of which rows correspond to which sync job by attaching a sync ID beside each row. For example, if I sync a connection at 6 AM, then receive new data in a sync at 10 AM, how can I tell exactly which new rows were from the 10 AM sync? The closest feature that Airbyte offers is the _AIRBYTE_EMITTED_AT column, but I don't think it can be used to effectively distinguish sync jobs. I confirmed on Slack here that this feature does not exist and was recommended to open this issue.

Describe the solution you’d like

This is what I'd ideally like: The first row comes from Sync A and the second row came from Sync B. I can tell the sync job that each row came from by the "batch_id" (or sync_id) column. I believe this functionality would be compatible with the "Incremental Sync" options since I'm assuming "Full refresh" wouldn't allow for this.

syncid

Jun 10 '21 13:06 justin-dropbase

Hey Justin, that's a very good issue. We want to track a lot more meta-data about replications.

@cgardens FYI

Jun 10 '21 17:06 michel-tricot

Thanks! I agree as well. Will update when we prioritize it!

Jun 15 '21 22:06 cgardens

note: from planning meeting. we should be thoughtful about other metadata to include.

Jun 16 '21 18:06 cgardens

Should we store these metadata as a json object?

Jun 17 '21 08:06 michel-tricot

@cgardens is there an ETA when it's going to be released?

Aug 01 '21 08:08 haliva-firmbase

@haliva-firmbase my hope is that we would start work on this in the next 2 weeks. we will be doing some work on our scheduler and will likely bundle this in as part of that project.

Aug 02 '21 21:08 cgardens

Any updates on the status of this? is it in progress, any eta? this will help a lot!

Sep 17 '21 06:09 3jerde

@andreasholmesberge thanks for your interest. Obviously my previous estimate was off. 😞 . We ended up having to re-prioritize in some other features. Right now we are targeting to start work on this in the second week of october.

Sep 17 '21 15:09 cgardens

This issue is from 2021... Something new happening about this? Are you going add mew column? I guess this is very important change.

Oct 18 '22 14:10 animer3009

even just one new column witch will show unique ID for sync iteration will help a lot! @cgardens this feature is critical for us.

Oct 18 '22 14:10 animer3009

👍 I'm facing the same issue as @animer3009

Dec 12 '22 17:12 philippeboyd

cc @malikdiarra would it be possible to get the the platform's job_id as an environment variable sent to the connector?

Jun 22 '23 00:06 evantahler

I'm also facing the same issue as @animer3009

Oct 18 '23 03:10 nauxliu

We will be adding this column. I expect this to be released in the next few months. @evantahler is in charge of this work.

May 23 '24 20:05 davinchia

Note: sync_id will be added to the _airbyte_meta JSON column within V2 destinations

May 26 '24 21:05 evantahler

Hi, I might be mistaken, but it seems like V2 Destinations only include relational DBs. Is there a reason why we can't have a unique identifier for a Sync Job regardless of the destination?

Today, Airbyte emits _airbyte_ab_id and _airbyte_emitted_at, can we not have a _airbyte_sync_id and use the Sync Job identifier for this purpose.

Edit: English is hard.

Jun 07 '24 11:06 harlemsparrow

Now that the Airbyte platform provides sync_id to the destination, it's up to every destination to include it in how it writes the records. We'll be taking care of updating the certified destinations, and relying on our community to update the rest.

Jun 07 '24 16:06 evantahler

Absolutely, evantahler.

It just wasn't clear from your last comment if the work will be limited to V2 or all certified connectors. Community connectors are up to the community to update - no question about it.

Thanks!

Jun 10 '24 18:06 harlemsparrow

Closing this issue as our certified destinations now store sync_id within the _airbyte_meta object column!

Aug 09 '24 15:08 evantahler