camel-k
camel-k copied to clipboard
Propagate errors to KameletBinding status
When there is an error in the operator during the reconciliation of a KameletBinding (e.g. a trait raises some error) the KameletBinding should reflect this error state in its status.
The actual behavior is that the KameletBinding is stuck in "Creating" phase and never recovers from that state.
To reproduce this behavior add a KameletBinding with an unknown Kamelet as a source or sink. The KameletBinding status should reflect the error state and provide some proper reason indicator in its condition status.
I had a look at this and I don't think it's a bug but the way it is designed the system so far (we may understand if it makes sense to change it though). Right now, if an Integration is in error state, then, the related KameletBinding is moved in error as well (tested with nightly build).
The scenario illustrated in the description is more generic because we never set an Integration in error when the traits are failing, but it is infinitely reconciled to watch for any change that may be applied to the Integration CR in order to have it fixed. That's the reason why we don't cascade the failure to the KameletBinding.
We may reason if it makes sense to apply some retry logic (as we do for the Build) or at least report the failure in the Integration status, as it is really difficult to understand what's going on when this situation happen (until we check the operator log).
Adding @astefanutti @lburgazzoli @oscerd to the thread for more opinions.
We should review how the re-queue of reconciliation events is handled by controller-runtime.
For errors, in general, we should configure the work-queue to use a rate-limited queue with an exponential failure rate limiter, cap the maximum number of retries, and eventually report the error in the custom resource status and forget the element from the queue. For functional errors, that we know won't be resolved by retrying, we may want to shortcut that mechanism and directly report the error in the custom resource status.
This issue has been automatically marked as stale due to 90 days of inactivity. It will be closed if no further activity occurs within 15 days. If you think that’s incorrect or the issue should never stale, please simply write any comment. Thanks for your contributions!