Sidecar container's status isn't propagated to Kservice
In what area(s)?
/area API
What version of Knative?
0.11.x
Expected Behavior
When I use multiple containers, one with no ports is not ready, Kservice also becomes Ready=False.
Actual Behavior
I use multiple containers, one with no ports is not ready.
$ kgp
NAME READY STATUS RESTARTS AGE
failed-multi-container-00001-deployment-75675ff7fd-lswwl 2/3 CrashLoopBackOff 299 (4m54s ago) 25h
However, Kservice's Ready is True
$ k get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
failed-multi-container https://<internal-domain> failed-multi-container-00001 failed-multi-container-00001 True
Steps to Reproduce the Problem
Use this template.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: failed-multi-container
spec:
template:
spec:
containerConcurrency: 0
containers:
- image: <always-failed-container>
name: failed-container
- image: <success-container>
name: success-container
ports:
- containerPort: 17123
protocol: TCP
readinessProbe:
successThreshold: 1
tcpSocket:
port: 0
timeoutSeconds: 300
traffic:
- latestRevision: true
percent: 100
Question
Is there a reason to consider only containers with ports when calculating ContainerHealth Condition in Revision?
https://github.com/knative/serving/blob/40ebfb69d55aff2e5fc60da77063599d4a3ea61c/pkg/reconciler/revision/reconcile_resources.go#L102-L117
I think checking the status of all containers that are not queue-proxy will solve this problem.
/triage accepted
@dprotaso Can I work on it if you don't mind?
@dprotaso I've found something strange.
When I reproduce the above situation, the Kservice is Ready=Unknown for a while (not sure how long, but maybe a few minutes) after it is created. After a few minutes, the KService changes to Ready=True.
When we check the status of the child resources, we see the following differences.
When KService Ready=Unknown
- Revision's
ResourceAvailableandContainerHealthyConditions areUnknown(Reason: Deploying) - PA's
ScaleTargetInitializedisUnknow
When KService Ready=True
- Revision's
ResourceAvailableandContainerHealthyConditions areTrue - PA's
ScaleTargetInitializedisTrue
Question
In code, https://github.com/knative/serving/blob/7c92928e39eac33564b0bbc65ce53fd0e9cdcbb8/pkg/apis/serving/v1/revision_lifecycle.go#L190-L198
It seems that the PA's ScaleTargetInitialized needs to be True to make the Revision's ResourceAvailable, ContainerHealthy Conditions True, which in turn makes the Kservice Ready=True. (Please let me know if I'm wrong)
The Kservice I'm having issues with (which I forgot to explain above) is using hpa. Checking the hpa controller code, it appears that the PA's ScaleTargetInitialized is only changed to True if the SKS is Ready=True.
https://github.com/knative/serving/blob/7c92928e39eac33564b0bbc65ce53fd0e9cdcbb8/pkg/reconciler/autoscaling/hpa/hpa.go#L93-L105
However, SKS is always Ready=False, regardless of KService's Ready.
conditions:
- lastTransitionTime: "2023-10-27T05:54:52Z"
message: Revision is backed by Activator
reason: ActivatorEndpointsPopulated
severity: Info
status: "True"
type: ActivatorEndpointsPopulated
- lastTransitionTime: "2023-10-27T05:54:52Z"
message: K8s Service is not ready
reason: NoHealthyBackends
status: Unknown
type: EndpointsPopulated
- lastTransitionTime: "2023-10-27T05:54:52Z"
message: K8s Service is not ready
reason: NoHealthyBackends
status: Unknown
type: Ready
observedGeneration: 1
privateServiceName: failed-multi-container-00001-private
serviceName: failed-multi-container-00001
Is there an expected reason for this behavior?
@dprotaso sorry for bothering you. Can you check above situation?
Hey @seongpyoHong - do you mind retesting your scenario with the latest serving release that came out two weeks ago (https://github.com/knative/serving/releases/tag/knative-v1.13.1)
The PR in that release https://github.com/knative/serving/pull/14795 includes a fix to how we propagate status conditions.
In my testing with your example - I see the Knative Service is in a 'failed' state and it remains like that.
I confirmed this is still a problem.