alternator-load-balancing
alternator-load-balancing copied to clipboard
Ensure that decomission and dead node scenarios are handled properly
We need to make sure that dynamodb requests are not failing when node is being decommissioned from the cluster. Instead failed http request has to be retried on another host.
Idea of advanced handling logic:
- Have a success and error metrics for all the nodes
- When request is failing client adds to error metric based on what error is it
- When requests are failing on certain node put them in the list of broken nodes and exclude from list of active nodes
- Once on a while allow request to be routed to the broken node to see if it is back to normal
- If such requests succeeds remove node from list of broken node