heroic
heroic copied to clipboard
query retry policy will try all alive nodes
When a Heroic shard executes a query it uses RetryPolicy.timed(30_000, RetryPolicy.exponential(100, 5000));
It will try all alive nodes until one responds with a success or the 30 second
duration is met.
This has the potential to propagate a query error across all the nodes.
An improvement would be to try at least X or a percent of X nodes rather than all of them.
It should also be validated that the RetryPolicy.timed
is working as intended and only retrying up to 30 seconds.
https://github.com/spotify/heroic/blob/ac849fac371ee142394f0f48165ce0d43c5dfb74/heroic-component/src/main/java/com/spotify/heroic/cluster/ClusterShard.java#L80