Knative service status to "Unknown" and "Uninitialized"
What version of Knative?
1.17.0
net-istio: 1.17.0
istio: 1.24.2
Expected Behavior
The knative service status should be “Ready” instead of hanging in “Unknown” state.
Actual Behavior
When I deploy a knative service, in a EKS cluster, it remains in “Unknown” status until the istio ingress-controllers are restarted, even if the application can be reached. It then switches to “Ready”, and the next application deployed will be in “Unknown” status, and so on.
kubectl get ksvc gsvc-serving-db07943b -n eb7d5189
NAME URL LATESTCREATED LATESTREADY READY REASON
gsvc-serving-db07943b http://test-eb7d5189.serverless-dev.xyz.crashcourse.com gsvc-serving-db07943b-00001 gsvc-serving-db07943b-00001 Unknown Uninitialized
The application is exposed with a loadbalancer and reachable
curl https://test-eb7d5189.serverless-dev.xyz.crashcourse.com
Hello World!
Here are the details of the knative service status:
kubectl get ksvc -n eb7d5189 gsvc-serving-db07943b -ojsonpath='{.status}' | jq
{
"address": {
"url": "http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local"
},
"conditions": [
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"status": "True",
"type": "ConfigurationsReady"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "Ready"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "RoutesReady"
}
],
"latestCreatedRevisionName": "gsvc-serving-db07943b-00001",
"latestReadyRevisionName": "gsvc-serving-db07943b-00001",
"observedGeneration": 1,
"traffic": [
{
"latestRevision": true,
"percent": 100,
"revisionName": "gsvc-serving-db07943b-00001"
}
],
"url": "http://test-eb7d5189.serverless-dev.xyz.crashcourse.com"
}
So, for the load-balancing I use an AWS NLB, and everything seems to be ok, all the targets (15021, 443, 80) are healthy.
I also noticed a couple of logs, probably related to the issue.
{
"severity": "ERROR",
"timestamp": "2025-02-06T13:25:22.20239663Z",
"logger": "net-istio-controller.istio-ingress-controller",
"caller": "status/status.go:421",
"message": "Probing of https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443 failed, IP: 100.64.174.122:443, ready: false, error: error roundtripping https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443/healthz: read tcp 100.64.162.193:36768->100.64.174.122:443: read: connection reset by peer (depth: 0)",
"commit": "4dff29e-dirty",
"knative.dev/controller": "knative.dev.net-istio.pkg.reconciler.ingress.Reconciler",
"knative.dev/kind": "networking.internal.knative.dev.Ingress",
"knative.dev/traceid": "de3b3adb-b689-4a4c-b4d4-39c22a0911ba",
"knative.dev/key": "eb7d5189/gsvc-serving-db07943b",
"stacktrace": "knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/[email protected]/pkg/status/status.go:421\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/[email protected]/pkg/status/status.go:306"
}
and
{
"severity": "WARNING",
"timestamp": "2025-02-06T13:25:22.997317236Z",
"logger": "controller",
"caller": "route/reconcile_resources.go:227",
"message": "Failed to update k8s service",
"commit": "6265a8e",
"knative.dev/pod": "controller-85c449cd99-97hgw",
"knative.dev/controller": "knative.dev.serving.pkg.reconciler.route.Reconciler",
"knative.dev/kind": "serving.knative.dev.Route",
"knative.dev/traceid": "e6e9c300-658b-4feb-8b59-e2bf6fa95bd1",
"knative.dev/key": "eb7d5189/gsvc-serving-db07943b",
"error": "failed to fetch loadbalancer domain/IP from ingress status"
}
I'd also like to point out that I looked at the route and the ingress, the outputs of which are as follows
route
{
"address": {
"url": "http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local"
},
"conditions": [
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"status": "True",
"type": "AllTrafficAssigned"
},
{
"lastTransitionTime": "2025-02-06T15:03:04Z",
"message": "Certificate route-55cecb6b-26fc-4fda-8e42-1d7d9c8fdd2b is not ready downgrade HTTP.",
"reason": "HTTPDowngrade",
"status": "True",
"type": "CertificateProvisioned"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "IngressReady"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "Ready"
}
],
"observedGeneration": 1,
"traffic": [
{
"latestRevision": true,
"percent": 100,
"revisionName": "gsvc-serving-db07943b-00001"
}
],
"url": "http://test-eb7d5189.serverless-dev.xyz.crashcourse.com"
}
ingress
{
"conditions": [
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "LoadBalancerReady"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"status": "True",
"type": "NetworkConfigured"
},
{
"lastTransitionTime": "2025-02-06T13:43:16Z",
"message": "Waiting for load balancer to be ready",
"reason": "Uninitialized",
"status": "Unknown",
"type": "Ready"
}
],
"observedGeneration": 1
}
strange findings
Logs from istiod
2025-02-03T10:57:39.622954Z info ads Push debounce stable[112] 1 for config Secret/eb7d5189/gsvc-pull-2f42fda6-serving-f61c3f70: 100.240948ms since last change, 100.240879ms since last push, full=false
2025-02-03T10:57:39.732479Z info model Incremental push, service gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local at shard Kubernetes/Kubernetes has no endpoints
2025-02-03T10:57:39.756497Z info model Full push, new service eb7d5189/gsvc-serving-db07943b-00001.eb7d5189.svc.cluster.local
2025-02-03T10:57:39.924255Z info ads Push debounce stable[113] 5 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local and 1 more configs: 100.548746ms since last change, 200.632268ms since last push, full=true
"outbound|443||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
"outbound|8012||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
"outbound|8022||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
"outbound|80||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
"outbound|9090||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {},
"outbound|9091||gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local": {}
2025-02-03T10:58:02.542487Z info model Full push, new service eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local
2025-02-03T10:58:02.722779Z info ads Push debounce stable[114] 3 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b-00001-private.eb7d5189.svc.cluster.local and 1 more configs: 100.661715ms since last change, 180.224615ms since last push, full=true
2025-02-03T10:58:02.961683Z info ads Push debounce stable[115] 3 for config ServiceEntry/eb7d5189/gsvc-serving-db07943b.eb7d5189.svc.cluster.local and 2 more configs: 100.299834ms since last change, 160.890523ms since last push, full=true
Services before ingress-controller restart
kubectl get svc -n eb7d5189
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gsvc-serving-db07943b ExternalName <none> test-eb7d5189.serverless-dev.xyz.crashcourse.com 80/TCP 3h37m
gsvc-serving-db07943b-00001 ClusterIP 172.20.247.29 <none> 80/TCP,443/TCP 3h37m
gsvc-serving-db07943b-00001-private ClusterIP 172.20.132.196 <none> 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP 3h37m
Services after ingress-controller restart
kubectl get svc -n eb7d5189
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gsvc-serving-db07943b ExternalName <none> knative-local-gateway.istio-system.svc.cluster.local 80/TCP 3h37m
gsvc-serving-db07943b-00001 ClusterIP 172.20.247.29 <none> 80/TCP,443/TCP 3h37m
gsvc-serving-db07943b-00001-private ClusterIP 172.20.132.196 <none> 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP 3h37m
The external-ip from ExternalName turned from test-eb7d5189.serverless-dev.xyz.crashcourse.com to knative-local-gateway.istio-system.svc.cluster.local.
Some tests
From a "Ready" service
curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-5fb72450-00001.eb7d5189.svc.cluster.local
200
curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-5fb72450.eb7d5189.svc.cluster.local
200
From the "Unknown" service
curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-db07943b-00001.eb7d5189.svc.cluster.local
200
curl -o /dev/null -s -w "%{http_code}\n" http://gsvc-serving-db07943b.eb7d5189.svc.cluster.local
404
Steps to Reproduce the Problem
- Set Ingress controllers (will create aws NLB, so you need a aws loadbalancer controller too)
- Set knative
- Deploy a service
Ingress controllers
My setup has some particularities. I use 3 different ingress controllers configured with the helm values as below:
helmCharts:
- includeCRDs: true
name: gateway
namespace: istio-system
releaseName: istio-ingressgateway
repo: https://istio-release.storage.googleapis.com/charts
version: 1.24.2
valuesInline:
service:
type: ClusterIP
- includeCRDs: true
name: gateway
namespace: istio-system
releaseName: istio-internal-ingressgateway
repo: https://istio-release.storage.googleapis.com/charts
version: 1.24.2
valuesInline:
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
podAnnotations:
proxy.istio.io/config: |
{
"gatewayTopology": {
"proxyProtocol": {}
}
}
labels:
app: istio-internal-ingressgateway
istio: ingressgateway
- includeCRDs: true
name: gateway
namespace: istio-system
releaseName: istio-external-ingressgateway
repo: https://istio-release.storage.googleapis.com/charts
version: 1.24.2
valuesInline:
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
podAnnotations:
proxy.istio.io/config: |
{
"gatewayTopology": {
"proxyProtocol": {}
}
}
labels:
app: istio-external-ingressgateway
istio: ingressgateway
In case you wonder, I use proxy config for matching source IPs and use it in AuthorizationPolicy afterwards.
Knative
I deploy knative using the knative-operator as follow :
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: knative-serving
namespace: knative-serving
annotations:
gladiator.app/name: knative-operator
spec:
version: "1.17"
high-availability:
replicas: 2
config:
istio:
gateway.knative-serving.knative-external-ingress-gateway: istio-external-ingressgateway.istio-system.svc.cluster.local
gateway.knative-serving.knative-internal-ingress-gateway: istio-internal-ingressgateway.istio-system.svc.cluster.local
defaults:
max-revision-timeout-seconds: "3600"
revision-timeout-seconds: "1800"
revision-response-start-timeout-seconds: '600'
autoscaler:
allow-zero-initial-scale: "true"
enable-scale-to-zero: "true"
initial-scale: "0"
deployment:
progress-deadline: "3600s"
features:
autodetect-http2: enabled
kubernetes.containerspec-addcapabilities: disabled
kubernetes.podspec-affinity: enabled
kubernetes.podspec-dnsconfig: disabled
kubernetes.podspec-dnspolicy: disabled
kubernetes.podspec-dryrun: allowed
kubernetes.podspec-fieldref: disabled
kubernetes.podspec-hostaliases: disabled
kubernetes.podspec-init-containers: enabled
kubernetes.podspec-nodeselector: enabled
kubernetes.podspec-persistent-volume-claim: enabled
kubernetes.podspec-persistent-volume-write: enabled
kubernetes.podspec-priorityclassname: disabled
kubernetes.podspec-runtimeclassname: enabled
kubernetes.podspec-schedulername: disabled
kubernetes.podspec-securitycontext: enabled
kubernetes.podspec-tolerations: enabled
kubernetes.podspec-topologyspreadconstraints: disabled
kubernetes.podspec-volumes-emptydir: enabled
kubernetes.podspec-volumes-hostpath: enabled
multi-container: enabled
queueproxy.mount-podinfo: disabled
tag-header-based-routing: disabled
multi-container-probing: enabled
gc:
min-non-active-revisions: "0"
max-non-active-revisions: "0"
retain-since-create-time: "disabled"
retain-since-last-active-time: "disabled"
leader-election:
lease-duration: 60s
logging:
loglevel.activator: info
loglevel.autoscaler: info
loglevel.controller: info
loglevel.hpaautoscaler: info
loglevel.net-certmanager-controller: info
loglevel.net-contour-controller: info
loglevel.net-istio-controller: info
loglevel.queueproxy: info
loglevel.webhook: info
network:
auto-tls: Disabled
domain-template: '{{index .Annotations "service.serverless.xyz.crashcourse.com/hostname"}}.{{.Domain}}'
ingress-class: "istio.ingress.networking.knative.dev"
observability:
logging.enable-probe-request-log: "true"
logging.enable-request-log: "true"
logging.request-log-template: >-
{"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js
.Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}",
"status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}",
"userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js
.Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js
.Request.Referer}}", "latency": {{.Response.Latency}}, "latencyNew":
{{.Response.Latency}}, "protocol": "{{.Request.Proto}}"}, "traceId":
"{{index .Request.Header "X-B3-Traceid"}}"}
metrics.backend-destination: prometheus
metrics.request-metrics-backend-destination: prometheus
tracing:
backend: none
domain-template is linked to an operator we have, so nvm that.
Knative Service
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: gsvc-serving-db07943b
namespace: eb7d5189
annotations:
gladiator/url-prefix: test-
service.serverless.xyz.crashcourse.com/endpoint: test-eb7d5189
service.serverless.xyz.crashcourse.com/hostname: test-eb7d5189
labels:
app.kubernetes.io/component: serving
app.kubernetes.io/part-of: service
serverless.xyz.crashcourse.com/service-name: test
service.serverless.xyz.crashcourse.com/endpoint: test-eb7d5189
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/max-scale: "1"
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/min-scale: "1"
spec:
containers:
- image: ghcr.io/knative/helloworld-go:latest
name: hello
ports:
- containerPort: 8080
env:
- name: TARGET
value: "World"
Same for the annotations/labels, is it linked to the operator
Hi @hyde404 ,
The external-ip from ExternalName turned from test-eb7d5189.serverless-dev.xyz.crashcourse.com to knative-local-gateway.istio-system.svc.cluster.local.
The externalname should point to istio it is used for different purposes e.g. traffic splitting.
I haven't checked all the details yet but is that external name being exposed on the AWS lb directly somehow (due to your ingresses) or Istio is not picking up changes? Could you try a more standard approach as in Knative docs as a smoke test?
Probing of https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443 failed, IP: 100.64.174.122:443, ready: false, error: error roundtripping https://test-eb7d5189.serverless-dev.xyz.crashcourse.com:443/healthz: read tcp 100.64.162.193:36768->100.64.174.122:443: read: connection reset by peer (depth: 0)",
This the reason you see the loadbalancer not being ready. I am wondering why https is used, what is the istio mode you use mtls?
Note: Unfortunately I dont have an AWS cluster to test, so I am guessing.
Hi @skonto,
Thanks for your reply !
The istio gateway mode I use is "simple", and the one set in the ingress contoller is apparently mTLS (controlPlaneAuthPolicy: MUTUAL_TLS).
I tried with a standard approach by installing knative/istio/net-istio using this piece of documentation and got the exact same result.
By removing
proxy.istio.io/config: |
{
"gatewayTopology": {
"proxyProtocol": {}
}
}
from podAnnotations, it works, but we loose the ability to keep client source IP which is not desirable.
A couple of combinations have been tested, based of this proxy config but we got no luck.
It seems like, a probe, maybe from net-istio has issues. Moreover, we came accross this feature request which really looks alike what we're facing right now.
having the exact same issue, when testing service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*" and "proxyProtocol": {}.
I finally managed to make it work to use these annotations
apiVersion: v1
kind: Service
metadata:
annotations:
...
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
...
(you can also add service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true, even if it works without it)
and this envoyfilter as well
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
annotations:
gladiator.app/name: istio-gateway
name: internal-proxy-protocol
namespace: istio-system
spec:
configPatches:
- applyTo: LISTENER
patch:
operation: MERGE
value:
listener_filters:
- name: envoy.filters.listener.proxy_protocol
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol
allow_requests_without_proxy_protocol: true
- name: envoy.filters.listener.tls_inspector
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
workloadSelector:
labels:
app: istio-internal-ingressgateway
So I totally got rid of "proxyProtocol": {}
Observed the same behavior with knative service in a Kubeflow 1.9.1 deployment (on-prem). Restart istio-system/istio-ingressgateway deployment will make the knative service accessible and also the knative service ExternalName changes to "knative-local-gateway.istio-system.svc.cluster.local".
Also observed in istio-ingressgateway log ' "GET /healthz HTTP/1.1" 404 NR route_not_found - "-" 0 0 0 - "192.168.0.238" "Knative-Ingress-Probe"' until restart the pod.
Easily reproducible on my cluster by running the Kserve sklearn-iris inferenceservice example.
Update:
I think I figured out the problem. Since I setup a domain entry in config-domain configMap, I need to add 'auto-tls: Disabled' in config-network if the knative service does not have https access setup. This can also be done in the knative service manifest by including in annotations 'networking.knative.dev/disableAutoTLS: "true"'.
I finally managed to make it work to use these annotations
apiVersion: v1 kind: Service metadata: annotations: ... service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' service.beta.kubernetes.io/aws-load-balancer-scheme: internal service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip ...(you can also add
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true, even if it works without it)and this envoyfilter as well
apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: annotations: gladiator.app/name: istio-gateway name: internal-proxy-protocol namespace: istio-system spec: configPatches: - applyTo: LISTENER patch: operation: MERGE value: listener_filters: - name: envoy.filters.listener.proxy_protocol typed_config: '@type': type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol allow_requests_without_proxy_protocol: true - name: envoy.filters.listener.tls_inspector typed_config: '@type': type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector workloadSelector: labels: app: istio-internal-ingressgatewaySo I totally got rid of "proxyProtocol": {}
Hi @hyde404 thanks for your update. Tested your approach and it is working fine.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.