amazon-kinesis-client
                                
                                 amazon-kinesis-client copied to clipboard
                                
                                    amazon-kinesis-client copied to clipboard
                            
                            
                            
                        Intermittent DynamoDB lease co-ordinator exception
Is this exception same as the lease renewal exception or its a different kind of error. It happened for 1 hour at midnight and didnt happen after. However there are still "Failed to create connection to https://kinesis.us-east-1.amazonaws.com/" errors in the log which is not impacting the applications processing. Are they related or separate issue. Are both issues KCL related?
{"DateTime":"2022-06-10 04:21:58,388"consumer/d854869d5a384f39b2a54bbd37d79823","Logger":"s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator","Message":"- LeasingException encountered in lease taking thread","MessageEvent":"2022-06-10 04:21:58","Thread":"[LeaseCoordinator-0007]","_bkt":"fsor-prd~153~C1404CEE-1C8C-431B-ABA1-089B61D6ADC9","_cd":"153:70447610","_eventtype_color":"none","_indextime":"1654835023","_raw":"MessageEvent=2022-06-10 04:21:58,388 [LeaseCoordinator-0007] ERROR s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator - LeasingException encountered in lease taking thread LogGroup=prod-fin-trx-event-consumer LogStream=ecs/fin-trx-event-consumer/d854869d5a384f39b2a54bbd37d79823","_si":["idx-i-03526d0610ee8e0a8.loyalty.splunkcloud.com","fsor-prd"],"_time":"1654834918.388","appVersion":"s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator","eventtype":["err0r","nix_errors"],"host":"http-inputs-firehose-loyalty.splunkcloud.com","index":"fsor-prd","linecount":"1","messageEvent":"","msg":"388 [LeaseCoordinator-0007] ERROR s.a.k.leases.dynamodb.DynamoDBLeaseCoordinator - LeasingException encountered in lease taking thread LogStream=ecs/fin-trx-event-consumer/d854869d5a384f39b2a54bbd37d79823","source":"http:fsor-firehose-txt-prd","sourcetype":"aws:firehose:text","splunk_server":"idx-i-03526d0610ee8e0a8.loyalty.splunkcloud.com","splunk_server_group":"","tag":["data_class_confidential","error"],"tag::eventtype":"error","tag::index":"data_class_confidential","timeendpos":"37","timestamp":"","timestartpos":"13","transactionId":""}
It's possible, but it's not possible to determine with just the data in the logs. This could be due to:
- A network-related error
- Churn/high load in your KCL worker node
- An issue in the Kinesis service (perhaps check the AWS service dashboard around this time for any posted events)
Without more data it's tough to say. Some relevant information (non-exhaustive) would be:
- What % of requests were affected? Which APIs?
- How many worker nodes were affected?
- Did host utilization increase (memory, cpu, i/o, etc) during this time?
I struggled with KCL for a long time - this is a very unstable API - reacts extremely badly to resharding - many shards simply stop being processed (only restarting the system (cluster) after the stream becomes "ACTIVE" helps, sometimes multiple restarts are required - but this leads to that the system can be idle for a long time), periodically slows down due to the Kinesis API limit (too many unnecessary requests !!!), and also requires too large limits for DynamiDB (greatly increases the cost of maintenance) - from experience, the cost of DynamiDB grows proportionally (or even exponentially?) to the number of workers in the system. And adjusting the parameters of such a system is very difficult ...
I was able to solve all the problems - just by switching to the Kinesis Data Streams API.
The main "problem" is just manually distributing shards evenly across workers. Resharding is very simple: splitting creates 2 new consumers on the worker, and merging the left worker creates a consumer. And when the reshard is complete, just redistribute the shards between the workers.
No stuck on New shards, no unnecessary kinesis requests, and no more expensive DynamoDB!
Cheaper - Stable - Simple!
I managed to remove the extra costs that the unnecessary DynamoDB did! Especially in the latest versions they changed the DynamoDB default pricing policy!!! - which increasing the cost at times!!! changing from provisioning to on-demand!!! I'm sure they *** many users of this library well this way!