compose icon indicating copy to clipboard operation
compose copied to clipboard

[RFC] Error during deployment: should we rollback?

Open pgrzesik opened this issue 3 years ago • 6 comments

Problem

When running commands across all components, for example deploy, some of them might error out.

Scenarios

  1. Assuming deploy command, some of the components have been successfully deployed, but components further in the deployment order crashes.

Questions For scenario 1:

  1. What should be the behavior?
  2. Should we try to cancel deployments that are in progress but didn't finish yet (e.g. might be on packaging step, assuming serverless-framework component here)?
  3. Should we roll back the components that have been deployed so far?
  4. Should we have general support for rollback functionality? If so, how should we record the previous state to know how we should rollback?

pgrzesik avatar Feb 11 '22 15:02 pgrzesik

I was bit by this:

image

I added a small basic behavior in 0783c3a3485fa0d84eca7db60cb0d04ebb44171f : stop deploying the next components in case of error:

image

This is just a first step of course.

Should we try to cancel deployments that are in progress but didn't finish yet

Not sure we can safely "cancel" all deployments of all kinds reliably? E.g. if sls deploy is interrupted today, the CF deployment finishes, right?

rollback

Rollback sounds good in theory, but might be ambitious 🤔

Let's gather some feedback on this throughout the beta.

mnapoli avatar Feb 24 '22 14:02 mnapoli

Great call with adding the small improvement 👍

pgrzesik avatar Feb 24 '22 14:02 pgrzesik

It's great that you wait for other deployments that are already in progress to finish 👍 Otherwise CF would continue deployment and re-deploy would fail as the stack would be UPDATE_IN_PROGRESS stack. If you cancel CF update, you should wait for the completion as well.

Skipping consecutive deployments makes sense, as they may depend on the one that failed.

One issue right now: if the deployment of a single service failed, the command output code is 0 (as in success). This would be very bad in CI.

Full rollback of all stacks would be very nice, although may be complicated. If you do so, there should be a flag to skip rollback - in dev env I don't want to wait 5 minutes for the rollback to complete because I misspelled some parameter name.

m-radzikowski avatar Mar 17 '22 11:03 m-radzikowski

👍

One issue right now: if the deployment of a single service failed, the command output code is 0 (as in success). This would be very bad in CI.

Good point, @pgrzesik this is something we should probably change. Should I create a separate issue for this?

mnapoli avatar Mar 17 '22 11:03 mnapoli

Thanks for the feedback @m-radzikowski 👍

@mnapoli Yes, definitely - I forgot to bring it up but I've noticed it as well - we handle gracefully such situations but we don't recognize them as errors from the perspective of the whole command - we should definitely change that

pgrzesik avatar Mar 17 '22 11:03 pgrzesik

👍 I created https://github.com/serverless/compose/issues/37

mnapoli avatar Mar 17 '22 13:03 mnapoli