jitsu icon indicating copy to clipboard operation
jitsu copied to clipboard

Sync Singer/Airbyte sources to BQ

Open xtreding opened this issue 4 years ago • 0 comments

Problem

BQ does not support data mutation in a reliable and fast fashion. That creates certain problems when BQ is used as a destination for pull sources:

  • Native Sources. Native source updates data in chunks, and each update looks like DELETE FROM ... WHERE chunk=chunk; INSERT ... (n - times). Due to nature of BQ mutation, just inserted data can be deleted afterwards
    • Consider: use chunks as table partitions. Dropping partition is a fast op
  • Airbyte / Singer. Airbyte and Singer sends data to destination as a single records (rows). Each record has a set of keys. A combination of keys is hashed to eventn_ctx_event_id and destination should support deduplication of records based on eventn_ctx_event_id. Unlike all other destinations, deduplication is not supported. Strictly speaking, most of the destinations do not rely on deduplication. However ones which do, can produce duplicated sources

Solution

The exact solution requires research of BQ features and internals

xtreding avatar Dec 06 '21 13:12 xtreding