elasticsearch
elasticsearch copied to clipboard
More discriminating `RESTART` shutdown logic
In a rolling restart we recommend users wait for the cluster health to reach green
in between node restarts, and some users will also wait for rebalancing to complete each time. This is unnecessarily conservative: it's safe to restart a node while the cluster health is still yellow
after the previous restart as long as the initializing shards are unrelated to the shards on the node that is to be restarted next.
It's not reasonable to ask users to compute when it's safe to restart a node themselves, but nor is it especially reasonable to wait for green
health after each node since this may extend the restart time by hours or even days in a large cluster. I believe the shutdown API should be able to solve this by reporting shardMigrationStatus == COMPLETE
on a RESTART
shutdown when all the shards on the target node are fully replicated. That's different from today's behaviour in which a RESTART
shutdown has shardMigrationStatus == COMPLETE
immediately, forcing users to use other APIs (e.g. cluster health) to wait as necessary.
Pinging @elastic/es-distributed (Team:Distributed)
Pinging @elastic/es-distributed (Team:Distributed)
Hey @DaveCTurner , As per my understanding here we want to change the shardMigrationStatus according to different conditions ( STALLED, IN_PROCESS, COMPLETED, NOT_STARTED) and also we do not want to update shardMigrationStatus when RESTART ( shutdownType ) is triggered. looking at the code
if (SingleNodeShutdownMetadata.Type.RESTART.equals(shutdownType)) {
return new ShutdownShardMigrationStatus(
SingleNodeShutdownMetadata.Status.COMPLETE,
0,
"no shard relocation is necessary for a node restart",
null
);
}
here we are marking status as COMPLETE when shutdownType is RESTART but if above condition is removed code will behave exactly same as we want ( correct me if I am wrong here ) which is based on different condition we will update the status . Am I missing something ? Any pointers ? TIA..
Hey @DaveCTurner , As per my understanding here we want to change the shardMigrationStatus according to different conditions ( STALLED, IN_PROCESS, COMPLETED, NOT_STARTED) and also we do not want to update shardMigrationStatus when RESTART ( shutdownType ) is triggered. looking at the code
if (SingleNodeShutdownMetadata.Type.RESTART.equals(shutdownType)) {
return new ShutdownShardMigrationStatus(
SingleNodeShutdownMetadata.Status.COMPLETE,
0,
"no shard relocation is necessary for a node restart",
null
);
}
here we are marking status as COMPLETE when shutdownType is RESTART but if above condition is removed code will behave exactly same as we want ( correct me if I am wrong here ) which is based on different condition we will update the status . Am I missing something ? Any pointers ? TIA..
Hi @prathm3, thanks for your interest here. I'm not sure I understand your question, but are you asking because you're interested in contributing a solution? This is quite a subtle issue and needs some discussion by the team before we decide on a path forwards. I wouldn't recommend on working on this area for now..
Hi @prathm3, thanks for your interest here. I'm not sure I understand your question, but are you asking because you're interested in contributing a solution? This is quite a subtle issue and needs some discussion by the team before we decide on a path forwards. I wouldn't recommend on working on this area for now..