lifion-kinesis
lifion-kinesis copied to clipboard
Lease Acquisition Recovery Interval causing lots of AWS rate-limit errors
Currently the ACQUIRE_LEASES_RECOVERY_INTERVAL is hard-coded to fire every 5 seconds if an error is returned when attempting to acquire a lease. This becomes a big problem if an account has a lot of kinesis streams and multiple instances of lifion-kinesis managing them.
My prod account is safe, because it only has about 10 kinesis streams. But my development account covers 10 different testing environments, so it has about 100 streams. Searching my logs for "Unexpected recoverable failure when trying to acquire" returns ~5,000 hits per minute.
I've opened a PR to make the lease acquisition recovery interval configurable. I ran this code in my test env and set it to 30 seconds. This dropped my error rate from 5,000/min to 20/min.
Hmm... it isn't letting me link the PR. Well, here's the PR for this issue. https://github.com/lifion/lifion-kinesis/pull/406