alternator-load-balancing icon indicating copy to clipboard operation
alternator-load-balancing copied to clipboard

Ensure that decomission and dead node scenarios are handled properly

Open dkropachev opened this issue 1 year ago • 0 comments

We need to make sure that dynamodb requests are not failing when node is being decommissioned from the cluster. Instead failed http request has to be retried on another host.

Idea of advanced handling logic:

  1. Have a success and error metrics for all the nodes
  2. When request is failing client adds to error metric based on what error is it
  3. When requests are failing on certain node put them in the list of broken nodes and exclude from list of active nodes
  4. Once on a while allow request to be routed to the broken node to see if it is back to normal
  5. If such requests succeeds remove node from list of broken node

dkropachev avatar Mar 11 '25 16:03 dkropachev