snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Loader: add possibility to enable transit load by default

Open chuwy opened this issue 7 years ago • 0 comments

We have a chance of race condition, breaking the load when two pipelines are involved. With current default behavior:

  1. Two pipelines Big and Small are loading data to same the DB
  2. Big starts at 0:00, Small starts at 0:15 and both have corresponding etl_tstamps
  3. Small finishes first and adds its etl_tstamp to Load Manifest
  4. Then Big load starts and Loader checks last etl_tstamp in events and manifest. Finds out they're similar (but none of them is correct) and abort the job

Quick workaround is to skip manifest_check. Correct workaround would be to always enable transit load when two pipelines are involved (right now it gets enabled automatically only when --folder is passed)

/cc @stdfalse

chuwy avatar Oct 09 '18 12:10 chuwy