camel-k
camel-k copied to clipboard
Operator is stuck in a "deploying" phase loop when internal deployment fails indefinitely
What happened?
When deploying an Integration using the Camel K Operator that fails a deployment due a either a missing dependency or perhaps an underlying API incompatibility, the Integration will be stuck in the Deploying
phase.
The issue here being that the Integration "phase" never goes into an Error
state even if the cause of the deployment failure is not self-healable (i.e. API incompatibility as an example).
I have two examples of when/where this occurs:
- Deploying an Integration to Camel K Operator with startupProbe enable on the Health Trait (Ref) when using Knative serving v1 API which DOES NOT support startupProbes (Ref)
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning IntegrationError 1s (x11 over 6s) camel-k-integration-controller Cannot reconcile Integration template-connector-1: err │
│ or executing post actions: error during apply resource: gc-generic/template-connector-1: failed to create typed patch object (gc-generic/templat │
│ e-connector-1; serving.knative.dev/v1, Kind=Service): .spec.template.spec.containers[0].startupProbe: field not declared in schema
- Deploying an Integration to Camel K Operator with a missing dependency (e.g. ConfigMap)
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning IntegrationError 2m46s (x288 over 3d3h) camel-k-integration-controller Cannot reconcile Integration template-connector-1: error du │
│ ring trait customization: master trait configuration failed: ConfigMap "template-connector-1-openapi-000" not found
In both cases The Camel K Operator reports the above issues as a warning
type of event when describing the deployment. Not sure if this should rather be classified as an error
instead, or perhaps at least move to error after X number of retries/backoff attempts?
I believe its crucial that an Integration should "eventually" end up an error
state if the issue cannot be self-healed as per the above example use-cases.
Steps to reproduce
- Setup a K8s cluster with the following
- Camel K Operator v2.1.0
- Knative service v1.12.0
- Deploy a Camel K Integration with the Health Trait enabled (
health.enabled=true
), and the startupProbe enabled (health.startup-probe-enabled=true
)
Relevant log output
│ Last Init Timestamp: 2024-01-18T12:48:11Z │
│ Observed Generation: 1 │
│ Phase: Deploying │
│ Platform: camel-k │
│ Profile: Knative │
│ Runtime Provider: quarkus │
│ Runtime Version: 3.2.0 │
│ Version: 2.1.0 │
│ Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Normal IntegrationConditionChanged 9s camel-k-integration-controller Condition "IntegrationPlatformAvailable" is "True" for │
│ Integration template-connector-1: camel-k/camel-k │
│ Normal IntegrationPhaseUpdated 9s camel-k-integration-controller Integration "template-connector-1" in phase "Initializ │
│ ation" │
│ Normal IntegrationPhaseUpdated 6s (x2 over 9s) camel-k-integration-controller Integration "template-connector-1" in phase "Building │
│ Kit" │
│ Normal IntegrationConditionChanged 6s camel-k-integration-controller Condition "IntegrationKitAvailable" is "True" for Inte │
│ gration template-connector-1: kit-cmkhrk8ve68c73e60i30 │
│ Normal IntegrationPhaseUpdated 6s camel-k-integration-controller Integration "template-connector-1" in phase "Deploying │
│ " │
│ Warning IntegrationError 1s (x11 over 6s) camel-k-integration-controller Cannot reconcile Integration template-connector-1: err │
│ or executing post actions: error during apply resource: gc-generic/template-connector-1: failed to create typed patch object (gc-generic/templat │
│ e-connector-1; serving.knative.dev/v1, Kind=Service): .spec.template.spec.containers[0].startupProbe: field not declared in schema
│ conditions: │
│ - firstTruthyTime: "2024-01-18T12:48:11Z" │
│ lastTransitionTime: "2024-01-18T12:48:11Z" │
│ lastUpdateTime: "2024-01-18T12:48:11Z" │
│ message: camel-k/camel-k │
│ reason: IntegrationPlatformAvailable │
│ status: "True" │
│ type: IntegrationPlatformAvailable │
│ - firstTruthyTime: "2024-01-18T12:48:14Z" │
│ lastTransitionTime: "2024-01-18T12:48:14Z" │
│ lastUpdateTime: "2024-01-18T12:48:14Z" │
│ message: kit-cmkhrk8ve68c73e60i30 │
│ reason: IntegrationKitAvailable │
│ status: "True" │
│ type: IntegrationKitAvailable
Camel K version
2.1.0
Thanks for reporting, I can confirm this behavior. We will have a look at this.
As a workaround, can you use the integration with health disabled ?
As a workaround, can you use the integration with health disabled ?
Hey @claudio4j,
Thanks so much for validating the issue.
We can just disable the startupProbe (traits.health.startupProbeEnabled: false
)...
However, that is not really the main issue in my opinion.
It's more so that the Integration is stuck in a "deploying" loop indefinitely, and never ends up in an error state.
Just to confirm, is the above described "deploying" behaviour on such deployment failure scenarios intended?
Take note that If you were deploying Integrations using some kind of automated CI process, and it was stuck in this kind of loop....it would be quite hard to determine/report on this deployment failure since technically the Integration never fails. Currently, it would require a someone to be aware of the deployment, and eyeball the issue.
is the above described "deploying" behaviour on such deployment failure scenarios intended?
Not on purpose. As you noted above, the camel-k-operator tries to set a field startupProbe
on a knative service, which is not supported and the error is not handled, leaving the Integration
object with the wrong status.
In the knative-service trait we have to handle this scenario (health enabled) and not set that field. We have to document this case and have testing in place.
I can take a look at this.
@realMartinez are you actively working on this?
@realMartinez are you actively working on this?
I have been working on another issue, I have put thison hold for now
Okey. Removed the assignment as I may have a look at this.