longhorn [IMPROVEMENT] Improve the replica deletion workflow

[IMPROVEMENT] Improve the replica deletion workflow

Open shuo-wu opened this issue 2 years ago • 1 comments

Is your improvement request related to a feature? Please describe

As mentioned in this comment, the current replica removal workflow is:

Longhorn manager directly stops the running replica process
The engine process realizes the unavailability of the replica, set the mode to ERR, then reports it to the longhorn manager
Longhorn manager removes the replica record from the engine spec (then status) after receiving the report, and asks the engine process to stop monitoring the replica.

Describe the solution you'd like

The running replica removal workflows can be a reverse of the volume attachment flow. And it's better to ask Longhorn manager to control everything rather than relying on the engine process's report:

Longhorn manager asks the engine process to stop tracking/monitoring the running replica process.
Longhorn manager wait for the tracking stopped then delete the replica process

Not sure if this is applicable or makes the whole workflow easier. What do you think? @joshimoo @PhanLe1010 @innobead

Aug 05 '22 10:08 shuo-wu

With your proposal, we can also avoid some misleading error messages when the engine panic because it cannot connect to the its replicas

Aug 05 '22 20:08 PhanLe1010

Hey team! Please add your planning poker estimate with Zenhub @derekbit @ejweber @PhanLe1010

Mar 15 '24 01:03 shuo-wu

longhorn longhorn copied to clipboard

[IMPROVEMENT] Improve the replica deletion workflow

Is your improvement request related to a feature? Please describe

Describe the solution you'd like

longhorn
longhorn copied to clipboard