compose
compose copied to clipboard
[RFC] Error during deployment: should we rollback?
Problem
When running commands across all components, for example deploy, some of them might error out.
Scenarios
- Assuming
deploycommand, some of the components have been successfully deployed, but components further in the deployment order crashes.
Questions For scenario 1:
- What should be the behavior?
- Should we try to cancel deployments that are in progress but didn't finish yet (e.g. might be on packaging step, assuming
serverless-frameworkcomponent here)? - Should we roll back the components that have been deployed so far?
- Should we have general support for
rollbackfunctionality? If so, how should we record the previous state to know how we should rollback?
I was bit by this:
I added a small basic behavior in 0783c3a3485fa0d84eca7db60cb0d04ebb44171f : stop deploying the next components in case of error:
This is just a first step of course.
Should we try to cancel deployments that are in progress but didn't finish yet
Not sure we can safely "cancel" all deployments of all kinds reliably? E.g. if sls deploy is interrupted today, the CF deployment finishes, right?
rollback
Rollback sounds good in theory, but might be ambitious 🤔
Let's gather some feedback on this throughout the beta.
Great call with adding the small improvement 👍
It's great that you wait for other deployments that are already in progress to finish 👍 Otherwise CF would continue deployment and re-deploy would fail as the stack would be UPDATE_IN_PROGRESS stack. If you cancel CF update, you should wait for the completion as well.
Skipping consecutive deployments makes sense, as they may depend on the one that failed.
One issue right now: if the deployment of a single service failed, the command output code is 0 (as in success). This would be very bad in CI.
Full rollback of all stacks would be very nice, although may be complicated. If you do so, there should be a flag to skip rollback - in dev env I don't want to wait 5 minutes for the rollback to complete because I misspelled some parameter name.
👍
One issue right now: if the deployment of a single service failed, the command output code is 0 (as in success). This would be very bad in CI.
Good point, @pgrzesik this is something we should probably change. Should I create a separate issue for this?
Thanks for the feedback @m-radzikowski 👍
@mnapoli Yes, definitely - I forgot to bring it up but I've noticed it as well - we handle gracefully such situations but we don't recognize them as errors from the perspective of the whole command - we should definitely change that
👍 I created https://github.com/serverless/compose/issues/37