cgru
cgru copied to clipboard
FATAL_ERROR task state
We're tracking resource usage (eg. memory), through the service and parser and will fail the task if it goes above the limit set (we're currently using capacity for that). However it will just error out and retry again according to the number retries allowed.
So my suggestion would be to add a new task state (eg. FATAL_ERROR) that will not retry the task unless it's un-blocked by the user. There might also be other use cases where you don't want to retry it after it fails.
Any thoughts about this?
Yes, it can be useful. And it is not hard to implement.
Yes, I see this can be useful as well. We have a similar system as you @lithorus but we do not have retries set per task so the task is blocked/stopped immediately.
I've started to look at it and might create a MR with the changes.