claudie icon indicating copy to clipboard operation
claudie copied to clipboard

Feature: Retrigger Manifest on error

Open Despire opened this issue 1 year ago • 1 comments

Description

Errors can occur during the execution of the manifest through the individual services. The aim of this feature would be to introduce a way for the user to trigger the rescheduling of the manifest. Why would this be needed ?

Currently the only way to trigger a reschedule is with changes to the InputManifest. While this works most of the time if there are any infrastructure issues, (i.e. the user can replace wrong nodepools with correct ones), there is no mechanism to retrigger the manifest if an error occurs after the infrastructure has been spawned correctly (i.e an error occurs during the ansibler, kube-eleven, kuber stage).

It would be beneficial to come up with a mechanism that would allow to retrigger the InputManifest in such cases.

Exit criteria

  • [ ] Possibility of triggering the workflow for a manifest manually by the user.

Despire avatar Dec 18 '24 08:12 Despire

Relates to #150. There's a risk of multiple edge-cases (e.g. autoscaling, current state awareness,... ), that might make the design of this feature challenging.

bernardhalas avatar Jul 11 '25 12:07 bernardhalas