Managed reconciler should update before deletion
What problem are you facing?
There are cases where deletion fails because of a config field but since managed reconciler doesn't update before deletion, the deletion always fails, hence result in a deadlock. One example is deletionProtectionEnabled config of AWS RDS. If it's true, all deletion requests fail until it's made false, however, even if you update the resource deletion is called before update and fails. So, there is no way for a Crossplane customer to trigger a valid deletion if the option was true and they tried to delete the instance.
How could Crossplane help solve your problem?
We can call update before deletion. In fact, I think deletion should be called no matter what in the case it's requested. An Update call could fail but we'd still want the deletion to be called if the user wanted it.
I'll start with moving Update call above Delete. After 0.4, we can investigate how we could make Delete be called even if the prior steps resulted in error.
@negz @hasheddan what do you think?
That seems like a good move. However, my one concern is if a resource gets in a state where it continuously reports that it needs to be updated. In that case, it would be virtually impossible for Crossplane to delete it. I guess a similar situation is occurring currently if we are trying to delete a resource that is not up to date. I wonder if we could build in a mechanism that keeps Delete first, but is knowledgeable if it has tried and failed to delete the resource. If yes, then it tries to Update it once, before then going back to attempting Delete.
This is mostly an edge case, and could potentially be solved in the short-term for RDS adding an AWS API update call in the Delete for RDS. Maybe that would actually be a good pattern across the board? Just make sure the resource is up to date within Delete?
my one concern is if a resource gets in a state where it continuously reports that it needs to be updated. In that case, it would be virtually impossible for Crossplane to delete it.
As long as Update doesn't fail, it should be fine. It'd move on to the next calls in the Reconcile function, not return from the Update function in the case of success.
I wonder if we could build in a mechanism that keeps Delete first, but is knowledgeable if it has tried and failed to delete the resource. If yes, then it tries to Update it once, before then going back to attempting Delete.
That sounds complicated to me. I think we need to aim for a solution where all ExternalClient functions are called before Delete. If requested, the Delete call should be made even though prior ExternalClient functions resulted in an error. Because for any reason, they can fail (like misconfiguration) and the user may want to delete the resource. We shouldn't get in the way of that.
This is mostly an edge case, and could potentially be solved in the short-term for RDS adding an AWS API update call in the Delete for RDS.
It could be solved this way but I'm not sure if it's really an edge case. I can see situations where there is a resource attached to X and it rejects to be deleted because it has an attached resource, but the only way to detach is basically updating the X resource, basically there could be some dependency config that has to be fixed to allow the deletion to go through but Update needs to be made to accommodate the fix.
But now that you mentioned it, I can be convinced to call update in delete call of RDS controller and let this issue be solved in 0.5. Making a change in managed reconciler this close to the release may not be the best course of action.
But now that you mentioned it, I can be convinced to call update in delete call of RDS controller and let this issue be solved in 0.5.
Yeah I kind of had to type everything out to be able to get to my final conclusion haha. Definitely would prefer to not add this into runtime at this point.
I'll start with moving Update call above Delete
Another option to consider here would be to change the current if meta.WasDeleted(managed) condition to if meta.WasDeleted(managed) && observation.UpToDate. At first glance I think this would cause us to:
- Hold off on processing the delete if the external resource needed updating.
- Update the external resource.
- Process the delete on the next reconcile.
It's possible (likely) step 2 would requeue step 3 after a long wait, so it may shake out simpler to just move Update before Delete, though I suspect that direction may also come with its own set of complications.
I'm hesitant about using observation.UpToDate for deletion condition since its default is false and the implementations of the logic providing that result are a bit shaky IMO, meaning you can get stuck just because some comparison method did something unexpected or there is no comparison at all, it always returns false.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.