airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Postgres on Resumable full refresh

Open xiaohansong opened this issue 10 months ago • 5 comments

Postgres on Resumable full refresh

  • adapt to rfr cdk interface
  • create state manager for rfr (final state handling)

xiaohansong avatar Apr 12 '24 21:04 xiaohansong

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview May 10, 2024 4:00pm

vercel[bot] avatar Apr 12 '24 21:04 vercel[bot]

An empty tables saved a streamState: null, causing the next sync to fail

Actually, even a table with small amount of records will first emit a null stream state. Not sure if that's the case with mssql and mysql also

rodireich avatar May 08 '24 20:05 rodireich

With xmin: final state is not saved so the next full refresh sync will read the last 10,000 records chunk over and over

rodireich avatar May 08 '24 20:05 rodireich

An empty tables saved a streamState: null, causing the next sync to fail

Actually, even a table with small amount of records will first emit a null stream state. Not sure if that's the case with mssql and mysql also

it's because in postgres, unless we reach to the first checkpoint the streamState will be null. Not sure why it would cause to fail?

xiaohansong avatar May 08 '24 21:05 xiaohansong

/publish-java-cdk

:clock2: https://github.com/airbytehq/airbyte/actions/runs/9023493316 :white_check_mark: Successfully published Java CDK version=0.34.2!

xiaohansong avatar May 09 '24 21:05 xiaohansong

Hi @xiaohansong I tried Postgres with CDC and there was ctid for some streams and some have an empty cursor field, but once the sync failed and I started the sync again it fully refreshed again. Is this normal behavior?

Hashcode-Ankit avatar Jul 05 '24 14:07 Hashcode-Ankit

@Hashcode-Ankit "resumable full refresh" only happens within the same sync job among attempts - that means if the sync job has a 2nd attempt it will pick up from the previous checkpoint of a full refresh stream, but if user kicks off a new sync job, regardless of the previous sync result, it will start full refresh from beginning.

If you do not wish to start from beginning consider using incremental refresh instead!

xiaohansong avatar Jul 05 '24 17:07 xiaohansong

Hi @xiaohansong I think what @Hashcode-Ankit means here is that he's trying CDC with postgres and it's the first sync during the sync some streams are fully loaded but the cursor fields are missing for those streams, and the current running stream has a CTID state, and I think the sync failed at the same time.

When he ran the next sync ran with the same state, It's restarting the full-load for every stream, which it shouldn't had.

piyushsingariya avatar Jul 05 '24 20:07 piyushsingariya