serving
serving copied to clipboard
How to detect a permanent service failure?
Ask your question here:
Hi there.
I'm facing a situation where I need to detect that a Knative Service will never be Ready because its Deployment progress deadline expired. This would happen, for example, when my cluster has no more resources to create new pods.
When I create the Knative Service and check its status, I see the conditions:
conditions:
- lastTransitionTime: "2024-08-21T12:33:36Z"
message: 'Revision "rest-1-00001" failed with message: 0/2 nodes are available:
2 Too many pods. preemption: 0/2 nodes are available: 2 No preemption victims
found for incoming pod..'
reason: RevisionFailed
status: "False"
type: ConfigurationsReady
- lastTransitionTime: "2024-08-21T12:33:36Z"
message: Configuration "rest-1" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: Ready
- lastTransitionTime: "2024-08-21T12:33:36Z"
message: Configuration "rest-1" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: RoutesReady
The Ready condition is False with RevisionMissing reason.
After the progress deadline expires. I see the conditions:
conditions:
- lastTransitionTime: "2024-08-21T12:29:42Z"
message: 'Revision "rest-1-00001" failed with message: Initial scale was never
achieved.'
reason: RevisionFailed
status: "False"
type: ConfigurationsReady
- lastTransitionTime: "2024-08-21T12:27:11Z"
message: Configuration "rest-1" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: Ready
- lastTransitionTime: "2024-08-21T12:27:11Z"
message: Configuration "rest-1" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: RoutesReady
The messages have changed, but the reasons are still the same.
What would be the recommended way of detecting that the Revision failed definitely without relying on parsing error messages?
Thanks for any help!