Knative Service failing to become ready if label is confused with annotation
While setting an annotation to a Knative Service (that should be a label) the service never gets ready
spec:
template:
metadata:
annotations:
networking.knative.dev/visibility: cluster-local
salaboy> k get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
agenda http://agenda.default.34.79.9.73.sslip.io agenda-00002 agenda-00001 Unknown
The revision for that new version of the Knative Service shows some issues about not being observable:
salaboy> k describe revision agenda-00002
Name: agenda-00002
Namespace: default
Labels: serving.knative.dev/configuration=agenda
serving.knative.dev/configurationGeneration=2
serving.knative.dev/configurationUID=9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1
serving.knative.dev/routingState=active
serving.knative.dev/service=agenda
serving.knative.dev/serviceUID=0ce1df8f-e380-4c0b-a119-43d1398e9bcd
Annotations: client.knative.dev/updateTimestamp: 2022-06-21T20:54:58Z
client.knative.dev/user-image: ghcr.io/salaboy/fmtok8s-email-service:v0.0.1-native
networking.knative.dev/visibility: cluster-local
serving.knative.dev/creator: [email protected]
serving.knative.dev/routes: agenda
serving.knative.dev/routingStateModified: 2022-06-22T11:52:27Z
API Version: serving.knative.dev/v1
Kind: Revision
Metadata:
Creation Timestamp: 2022-06-22T11:52:27Z
Generation: 1
Managed Fields:
API Version: serving.knative.dev/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:client.knative.dev/updateTimestamp:
f:client.knative.dev/user-image:
f:networking.knative.dev/visibility:
f:serving.knative.dev/creator:
f:serving.knative.dev/routes:
f:serving.knative.dev/routingStateModified:
f:labels:
.:
f:serving.knative.dev/configuration:
f:serving.knative.dev/configurationGeneration:
f:serving.knative.dev/configurationUID:
f:serving.knative.dev/routingState:
f:serving.knative.dev/service:
f:serving.knative.dev/serviceUID:
f:ownerReferences:
.:
k:{"uid":"9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1"}:
f:spec:
.:
f:containerConcurrency:
f:containers:
f:enableServiceLinks:
f:timeoutSeconds:
Manager: Go-http-client
Operation: Update
Time: 2022-06-22T11:52:27Z
API Version: serving.knative.dev/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
f:containerStatuses:
f:observedGeneration:
Manager: Go-http-client
Operation: Update
Subresource: status
Time: 2022-06-22T11:52:27Z
Owner References:
API Version: serving.knative.dev/v1
Block Owner Deletion: true
Controller: true
Kind: Configuration
Name: agenda
UID: 9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1
Resource Version: 30637176
UID: 1dd8a3e1-5332-41f3-aa62-e723bf6ea91b
Spec:
Container Concurrency: 0
Containers:
Image: ghcr.io/salaboy/fmtok8s-email-service:v0.0.1-native
Name: user-container
Readiness Probe:
Success Threshold: 1
Tcp Socket:
Port: 0
Resources:
Enable Service Links: false
Timeout Seconds: 300
Status:
Conditions:
Last Transition Time: 2022-06-22T11:52:28Z
Message: unsuccessfully observed a new generation
Reason: NewObservedGenFailure
Severity: Info
Status: Unknown
Type: Active
Last Transition Time: 2022-06-22T11:52:27Z
Reason: Deploying
Status: Unknown
Type: ContainerHealthy
Last Transition Time: 2022-06-22T11:52:27Z
Reason: Deploying
Status: Unknown
Type: Ready
Last Transition Time: 2022-06-22T11:52:27Z
Reason: Deploying
Status: Unknown
Type: ResourcesAvailable
Container Statuses:
Image Digest: ghcr.io/salaboy/fmtok8s-email-service@sha256:86c5d010599a2d633f5dd7a75bbbff1a874008bcb3501ca3243146e4a7819adc
Name: user-container
Observed Generation: 1
Events: <none>
/area networking /kind bug
What version of Knative?
1.6.x With Kourier as ingress
Expected Behavior
Setting an annotation that is not expected shouldn't break a Knative Service.
An error should help us to troubleshoot the issue.
Actual Behavior
It breaks
Steps to Reproduce the Problem
Add the cluster visibility as an annotation instead of a label
spec:
template:
metadata:
annotations:
networking.knative.dev/visibility: cluster-local
The service won't get marked as failed until the progress deadline expires (default it 10m). If you set a lower progress deadline, for example:
spec:
template:
metadata:
annotations:
serving.knative.dev/progress-deadline: "10s"
networking.knative.dev/visibility: cluster-local
then the service readiness status will be marked as "false" in about 40s (10s progress deadline + 30s grace period)
$ kn service list
NAME URL LATEST AGE CONDITIONS READY REASON
hello http://hello.default.10.100.121.103.sslip.io 2m10s 0 OK / 3 False RevisionMissing : Configuration "hello" does not have any ready Revision.
$ k get ksvc hello -o yaml
<snip>
status:
conditions:
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: 'Revision "hello-00001" failed with message: .'
reason: RevisionFailed
status: "False"
type: ConfigurationsReady
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: Configuration "hello" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: Ready
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: Configuration "hello" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: RoutesReady
latestCreatedRevisionName: hello-00001
observedGeneration: 1
url: http://hello.default.10.100.121.103.sslip.io
The ConfigurationReady error message could be a little nicer though... I think I did something similar for failing revisions, so should be possible to piggyback off that...
@psschwei that is interesting.. Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?
Also, 10 mins waiting time to be ready sounds like a lot, but I am guessing that is covering cases where downloading the container might take a lot of time. Are there any other cases where we need to wait for that?
10 mins waiting time to be ready sounds like a lot
At one point in time we did have it much shorter, but it was changed it to be in sync with the default Kubernetes value.
Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?
It looks it's an issue with the deployment reconciliation... I see the following in the deployment events:
$ k describe revision
<snip>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InternalError 10m (x2 over 10m) revision-controller failed to update deployment "hello-00001-deployment": Operation cannot be fulfilled on deployments.apps "hello-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
Also looks like the incorrect annotation is failing the SKS validation. From the webhook logs:
{"severity":"ERROR","timestamp":"2022-06-23T20:43:41.248281768Z","logger":"webhook","caller":"validation/validation_admit.go:181","message":"Failed the resource specific validation","commit":"3573163","knative.dev/pod":"webhook-6fd4c9cbc4-rnv88","knative.dev/kind":"networking.internal.knative.dev/v1alpha1, Kind=ServerlessService","knative.dev/namespace":"default","knative.dev/name":"hello-00001","knative.dev/operation":"CREATE","knative.dev/resource":"networking.internal.knative.dev/v1alpha1, Resource=serverlessservices","knative.dev/subresource":"","knative.dev/userinfo":"{system:serviceaccount:knative-serving:controller 8a1a5e12-4059-46e3-b0d2-9f9ebb74aab1 [system:serviceaccounts system:serviceaccounts:knative-serving system:authenticated] map[authentication.kubernetes.io/pod-name:[autoscaler-56975b5bbb-4x625] authentication.kubernetes.io/pod-uid:[b98cf5b1-ee75-4217-b396-ef194ea82051]]}","stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/[email protected]/webhook/resourcesemantics/validation/validation_admit.go:181\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/[email protected]/webhook/resourcesemantics/validation/validation_admit.go:80\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/[email protected]/webhook/admission.go:117\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2462\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/[email protected]/webhook/webhook.go:262\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/[email protected]/network/handlers/drain.go:110\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"}
note - the visibility label doesn't go on the spec.template.metadata.labels but on the metadata of the top-level Knative Service or Route
see: https://knative.dev/docs/serving/services/private-services
Few things to take from this issue
- https://github.com/knative/serving/issues/13131)
- we should reject creates/updates on Knative Service Revision Template's annotations/labels we know to be problematic
/triage accepted