amazon-kinesis-client icon indicating copy to clipboard operation
amazon-kinesis-client copied to clipboard

Intermittent DynamoDB lease co-ordinator exception

Open sarbari1 opened this issue 3 years ago • 2 comments

Is this exception same as the lease renewal exception or its a different kind of error. It happened for 1 hour at midnight and didnt happen after. However there are still "Failed to create connection to https://kinesis.us-east-1.amazonaws.com/" errors in the log which is not impacting the applications processing. Are they related or separate issue. Are both issues KCL related?

{"DateTime":"2022-06-10 04:21:58,388"consumer/d854869d5a384f39b2a54bbd37d79823","Logger":"s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator","Message":"- LeasingException encountered in lease taking thread","MessageEvent":"2022-06-10 04:21:58","Thread":"[LeaseCoordinator-0007]","_bkt":"fsor-prd~153~C1404CEE-1C8C-431B-ABA1-089B61D6ADC9","_cd":"153:70447610","_eventtype_color":"none","_indextime":"1654835023","_raw":"MessageEvent=2022-06-10 04:21:58,388 [LeaseCoordinator-0007] ERROR s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator - LeasingException encountered in lease taking thread LogGroup=prod-fin-trx-event-consumer LogStream=ecs/fin-trx-event-consumer/d854869d5a384f39b2a54bbd37d79823","_si":["idx-i-03526d0610ee8e0a8.loyalty.splunkcloud.com","fsor-prd"],"_time":"1654834918.388","appVersion":"s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator","eventtype":["err0r","nix_errors"],"host":"http-inputs-firehose-loyalty.splunkcloud.com","index":"fsor-prd","linecount":"1","messageEvent":"","msg":"388 [LeaseCoordinator-0007] ERROR s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator - LeasingException encountered in lease taking thread LogStream=ecs/fin-trx-event-consumer/d854869d5a384f39b2a54bbd37d79823","source":"http:fsor-firehose-txt-prd","sourcetype":"aws:firehose:text","splunk_server":"idx-i-03526d0610ee8e0a8.loyalty.splunkcloud.com","splunk_server_group":"","tag":["data_class_confidential","error"],"tag::eventtype":"error","tag::index":"data_class_confidential","timeendpos":"37","timestamp":"","timestartpos":"13","transactionId":""}

sarbari1 avatar Jun 10 '22 17:06 sarbari1

It's possible, but it's not possible to determine with just the data in the logs. This could be due to:

  1. A network-related error
  2. Churn/high load in your KCL worker node
  3. An issue in the Kinesis service (perhaps check the AWS service dashboard around this time for any posted events)

Without more data it's tough to say. Some relevant information (non-exhaustive) would be:

  1. What % of requests were affected? Which APIs?
  2. How many worker nodes were affected?
  3. Did host utilization increase (memory, cpu, i/o, etc) during this time?

joshua-kim avatar Jun 15 '22 16:06 joshua-kim

I struggled with KCL for a long time - this is a very unstable API - reacts extremely badly to resharding - many shards simply stop being processed (only restarting the system (cluster) after the stream becomes "ACTIVE" helps, sometimes multiple restarts are required - but this leads to that the system can be idle for a long time), periodically slows down due to the Kinesis API limit (too many unnecessary requests !!!), and also requires too large limits for DynamiDB (greatly increases the cost of maintenance) - from experience, the cost of DynamiDB grows proportionally (or even exponentially?) to the number of workers in the system. And adjusting the parameters of such a system is very difficult ...

I was able to solve all the problems - just by switching to the Kinesis Data Streams API.

The main "problem" is just manually distributing shards evenly across workers. Resharding is very simple: splitting creates 2 new consumers on the worker, and merging the left worker creates a consumer. And when the reshard is complete, just redistribute the shards between the workers.

No stuck on New shards, no unnecessary kinesis requests, and no more expensive DynamoDB!

Cheaper - Stable - Simple!

I managed to remove the extra costs that the unnecessary DynamoDB did! Especially in the latest versions they changed the DynamoDB default pricing policy!!! - which increasing the cost at times!!! changing from provisioning to on-demand!!! I'm sure they *** many users of this library well this way!

VladimirPchelko avatar Jul 08 '22 06:07 VladimirPchelko