Feature: Retrigger Manifest on error
Description
Errors can occur during the execution of the manifest through the individual services. The aim of this feature would be to introduce a way for the user to trigger the rescheduling of the manifest. Why would this be needed ?
Currently the only way to trigger a reschedule is with changes to the InputManifest. While this works most of the time if there are any infrastructure issues, (i.e. the user can replace wrong nodepools with correct ones), there is no mechanism to retrigger the manifest if an error occurs after the infrastructure has been spawned correctly (i.e an error occurs during the ansibler, kube-eleven, kuber stage).
It would be beneficial to come up with a mechanism that would allow to retrigger the InputManifest in such cases.
Exit criteria
- [ ] Possibility of triggering the workflow for a manifest manually by the user.
Relates to #150. There's a risk of multiple edge-cases (e.g. autoscaling, current state awareness,... ), that might make the design of this feature challenging.