snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Loader: add back-off to consistency check

Open chuwy opened this issue 8 years ago • 3 comments

Right now delay is always ((atomicFiles.length * 0.1 * shreddedTypes.length) + 5) seconds and Loader keeps invoke Thread.sleep until state is consistent or check happened 5 times.

For 500 atomic-events files and 90 shredded types (which I believe can be common for big pipelines) this delay can reach 75 minutes even if state would be consistent immediately.

We can improve formulae by adding backoff policy so first check would happen much faster and if S3 remains inconsistent - add more delay.

chuwy avatar Dec 04 '17 13:12 chuwy

For a certain large customer (300 shredded types), this process was always taking 3 hours.

alexanderdean avatar Dec 04 '17 13:12 alexanderdean

If that pipeline had ~300 atomic-events files. If less - chances are that S3 was inconsistent for one or two invokes - amount of time S3 was inconsistent should be clear from log output.

chuwy avatar Dec 04 '17 13:12 chuwy

Pushing back in favor of https://github.com/snowplow/snowplow-rdb-loader/issues/81

chuwy avatar Jan 08 '18 16:01 chuwy