snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
RDB Loader: add back-off to consistency check
Right now delay is always ((atomicFiles.length * 0.1 * shreddedTypes.length) + 5) seconds and Loader keeps invoke Thread.sleep until state is consistent or check happened 5 times.
For 500 atomic-events files and 90 shredded types (which I believe can be common for big pipelines) this delay can reach 75 minutes even if state would be consistent immediately.
We can improve formulae by adding backoff policy so first check would happen much faster and if S3 remains inconsistent - add more delay.
For a certain large customer (300 shredded types), this process was always taking 3 hours.
If that pipeline had ~300 atomic-events files. If less - chances are that S3 was inconsistent for one or two invokes - amount of time S3 was inconsistent should be clear from log output.
Pushing back in favor of https://github.com/snowplow/snowplow-rdb-loader/issues/81