Getting "Cannot load the graph: cluster [kubernetes] is not found or is not accessible for Kiali" with certain prometheus configurations.
Describe the bug
Trying to get Kiali to work with my AKS cluster with the kube-prometheus-stack Helm chart.
Currently, features revolving around Istio work. It seems like Kiali is having a hard time picking up Prometheus info.
Getting "Cannot load the graph: cluster [kubernetes] is not found or is not accessible for Kiali" when attempting to open the traffic graph of Kiali.
Getting this unhelpful error message in the logs of Kiali:
2024-04-24T23:07:09Z DBG Found controlplane [istiod-asm-1-20/aks-istio-system] on cluster [aks-dev-eastus-152].
2024-04-24T23:07:11Z ERR K8s Cache [kubernetes] is not found or is not accessible for Kiali: goroutine 801465 [running]:
runtime/debug.Stack()
/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/debug/stack.go:24 +0x65
github.com/kiali/kiali/handlers.handlePanic({0x253b1d0, 0xc0003bfea8})
/home/runner/work/kiali/kiali/handlers/graph.go:86 +0x219
panic({0x1dc38c0, 0xc000694bf0})
/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/panic.go:884 +0x213
github.com/kiali/kiali/graph.CheckError(...)
/home/runner/work/kiali/kiali/graph/util.go:38
github.com/kiali/kiali/graph/telemetry/istio/appender.ServiceEntryAppender.loadServiceEntryHosts({0xc002168c30?, {0xc001cfb3e0?, 0x220f631?}}, {0xc000ddafa0, 0xa}, {0xc0019d4990, 0x11}, 0xc0015b6630)
/home/runner/work/kiali/kiali/graph/telemetry/istio/appender/service_entry.go:115 +0x918
github.com/kiali/kiali/graph/telemetry/istio/appender.ServiceEntryAppender.AppendGraph({0xc0015b64e0?, {0xc0010a807b?, 0xc001e1d6c0?}}, 0xc0015b6690, 0x7f559f849c00?, 0xc001b7a9f0?)
/home/runner/work/kiali/kiali/graph/telemetry/istio/appender/service_entry.go:97 +0x405
github.com/kiali/kiali/graph/telemetry/istio.BuildNodeTrafficMap({0xc0015b64e0, {0x0, {0xc00065b830, 0x9, 0x9}}, 0x0, 0x1, 0xc0015b63f0, {{0x220c92e, 0x8}, ...}, ...}, ...)
/home/runner/work/kiali/kiali/graph/telemetry/istio/istio.go:486 +0x3c2
github.com/kiali/kiali/graph/api.graphNodeIstio({_, _}, _, _, {{0x220d9e6, 0x9}, {0x2207c58, 0x5}, {{0xc0010a80bd, 0x3}, ...}, ...})
/home/runner/work/kiali/kiali/graph/api/api.go:90 +0x125
github.com/kiali/kiali/graph/api.GraphNode({_, _}, _, {{0x220d9e6, 0x9}, {0x2207c58, 0x5}, {{0xc0010a80bd, 0x3}, {0x13a52453c000, ...}}, ...})
/home/runner/work/kiali/kiali/graph/api/api.go:72 +0x1d5
github.com/kiali/kiali/handlers.GraphNode({0x253b1d0, 0xc0003bfea8}, 0xc0000c2400)
/home/runner/work/kiali/kiali/handlers/graph.go:64 +0x165
net/http.HandlerFunc.ServeHTTP(0x220ba66?, {0x253b1d0?, 0xc0003bfea8?}, 0x7f559f6a6120?)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2122 +0x2f
github.com/kiali/kiali/routing.metricHandler.func1({0x254ada0?, 0xc000d9bea0}, 0xc000846c01?)
/home/runner/work/kiali/kiali/routing/router.go:206 +0x13b
net/http.HandlerFunc.ServeHTTP(0x254c140?, {0x254ada0?, 0xc000d9bea0?}, 0xc000846ca0?)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2122 +0x2f
github.com/kiali/kiali/handlers.AuthenticationHandler.Handle.func1({0x254ada0, 0xc000d9bea0}, 0xc0000c2200)
/home/runner/work/kiali/kiali/handlers/authentication.go:79 +0x3fd
net/http.HandlerFunc.ServeHTTP(0x254c140?, {0x254ada0?, 0xc000d9bea0?}, 0x251c908?)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2122 +0x2f
github.com/kiali/kiali/server.plainHttpMiddleware.func1({0x254ada0?, 0xc000d9bea0?}, 0xc0015b6300?)
/home/runner/work/kiali/kiali/server/server.go:178 +0x65
net/http.HandlerFunc.ServeHTTP(0xc0000c2100?, {0x254ada0?, 0xc000d9bea0?}, 0x2?)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2122 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000aae000, {0x254ada0, 0xc000d9bea0}, 0xc0000c2000)
/home/runner/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:212 +0x1cf
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x254abf0, 0xc001034ee0}, 0x0?)
/home/runner/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:336 +0x24e
net/http.HandlerFunc.ServeHTTP(0xc0002fc280?, {0x254abf0?, 0xc001034ee0?}, 0x40dc4a?)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc0010a8061?, {0x254abf0, 0xc001034ee0}, 0xc0000c2000)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2500 +0x149
net/http.serverHandler.ServeHTTP({0xc002376930?}, {0x254abf0, 0xc001034ee0}, 0xc0000c2000)
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000e66090, {0x254c140, 0xc0015af5f0})
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
/opt/hostedtoolcache/go/1.20.10/x64/src/net/http/server.go:3089 +0x5ed
Expected Behavior
A traffic graph, showing applications, etc.
What are the steps to reproduce this bug?
- Install 'kube-prometheus-stack' Helm chart release, with version 48.3.1.
- Utilize the managed service mesh addon through AKS, Istio install via Terraform.
- Install KialiCR with values below.
Environment
Learn about how to determine versions here.
- Kiali version: 1.82
- Istio version: 1.20
- **Kubernetes impl: **
- Kubernetes version: 1.28.5
- Other notable environmental factors:
KialiCR Values:
nameOverride: ""
fullnameOverride: ""
image: # see: https://quay.io/repository/kiali/kiali-operator?tab=tags
repo: quay.io/kiali/kiali-operator # quay.io/kiali/kiali-operator
tag: v1.82 # version string like v1.39.0 or a digest hash
digest: "" # use "sha256" if tag is a sha256 hash (do NOT prefix this value with a "@")
pullPolicy: Always
pullSecrets: []
# Deployment options for the operator pod.
nodeSelector: {}
podAnnotations: {}
podLabels: {}
env: []
tolerations: []
resources:
requests:
cpu: "10m"
memory: "64Mi"
affinity: {}
replicaCount: 1
priorityClassName: ""
securityContext: {}
# metrics.enabled: set to true if you want Prometheus to collect metrics from the operator
metrics:
enabled: true
# debug.enabled: when true the full ansible logs are dumped after each reconciliation run
# debug.verbosity: defines the amount of details the operator will log (higher numbers are more noisy)
# debug.enableProfiler: when true (regardless of debug.enabled), timings for the most expensive tasks will be logged after each reconciliation loop
debug:
enabled: true
verbosity: "1"
enableProfiler: false
# Defines where the operator will look for Kial CR resources. "" means "all namespaces".
watchNamespace: ""
# Set to true if you want the operator to be able to create cluster roles. This is necessary
# if you want to support Kiali CRs with spec.deployment.accessible_namespaces of '**'.
# Setting this to "true" requires allowAllAccessibleNamespaces to be "true" also.
# Note that this will be overriden to "true" if cr.create is true and cr.spec.deployment.accessible_namespaces is ['**'].
clusterRoleCreator: true
# Set to a list of secrets in the cluster that the operator will be allowed to read. This is necessary if you want to
# support Kiali CRs with spec.kiali_feature_flags.certificates_information_indicators.enabled=true.
# The secrets in this list will be the only ones allowed to be specified in any Kiali CR (in the setting
# spec.kiali_feature_flags.certificates_information_indicators.secrets).
# If you set this to an empty list, the operator will not be given permission to read any additional secrets
# found in the cluster, and thus will only support a value of "false" in the Kiali CR setting
# spec.kiali_feature_flags.certificates_information_indicators.enabled.
secretReader: ['cacerts', 'istio-ca-secret']
# Set to true if you want to allow the operator to only be able to install Kiali in view-only-mode.
# The purpose for this setting is to allow you to restrict the permissions given to the operator itself.
onlyViewOnlyMode: false
# allowAdHocKialiNamespace tells the operator to allow a user to be able to install a Kiali CR in one namespace but
# be able to install Kiali in another namespace. In other words, it will allow the Kiali CR spec.deployment.namespace
# to be something other than the namespace where the CR is installed. You may want to disable this if you are
# running in a multi-tenant scenario in which you only want a user to be able to install Kiali in the same namespace
# where the user has permissions to install a Kiali CR.
allowAdHocKialiNamespace: true
# allowAdHocKialiImage tells the operator to allow a user to be able to install a custom Kiali image as opposed
# to the image the operator will install by default. In other words, it will allow the
# Kiali CR spec.deployment.image_name and spec.deployment.image_version to be configured by the user.
# You may want to disable this if you do not want users to install their own Kiali images.
allowAdHocKialiImage: false
# allowAdHocOSSMConsoleImage tells the operator to allow a user to be able to install a custom OSSMC image as opposed
# to the image the operator will install by default. In other words, it will allow the
# OSSMConsole CR spec.deployment.imageName and spec.deployment.imageVersion to be configured by the user.
# You may want to disable this if you do not want users to install their own OSSMC images.
# This is only applicable when running on OpenShift.
allowAdHocOSSMConsoleImage: false
# allowSecurityContextOverride tells the operator to allow a user to be able to fully override the Kiali
# container securityContext. If this is false, certain securityContext settings must exist on the Kiali
# container and any attempt to override them will be ignored.
allowSecurityContextOverride: false
# allowAllAccessibleNamespaces tells the operator to allow a user to be able to configure Kiali
# to access all namespaces in the cluster via spec.deployment.accessible_namespaces=['**'].
# If this is false, the user must specify an explicit list of namespaces in the Kiali CR.
# Setting this to "true" requires clusterRoleCreator to be "true" also.
# Note that this will be overriden to "true" if cr.create is true and cr.spec.deployment.accessible_namespaces is ['**'].
allowAllAccessibleNamespaces: true
# accessibleNamespacesLabel restricts the namespaces that a user can add to the Kiali CR spec.deployment.accessible_namespaces.
# This value is either an empty string (which disables this feature) or a label name with an optional label value
# (e.g. "mylabel" or "mylabel=myvalue"). Only namespaces that have that label will be permitted in
# spec.deployment.accessible_namespaces. Any namespace not labeled properly but specified in accessible_namespaces will cause
# the operator to abort the Kiali installation.
# If just a label name (but no label value) is specified, the label value the operator will look for is the value of
# the Kiali CR's spec.istio_namespace. In other words, the operator will look for the named label whose value must be the name
# of the Istio control plane namespace (which is typically, but not necessarily, "istio-system").
accessibleNamespacesLabel: ""
# For what a Kiali CR spec can look like, see:
# https://github.com/kiali/kiali-operator/blob/master/deploy/kiali/kiali_cr.yaml
cr:
create: true
name: kiali
# If you elect to create a Kiali CR (--set cr.create=true)
# and the operator is watching all namespaces (--set watchNamespace="")
# then this is the namespace where the CR will be created (the default will be the operator namespace).
namespace: ""
# Annotations to place in the Kiali CR metadata.
annotations: {}
spec:
kubernetes_config:
cluster_name: "aks-dev-eastus-152"
auth:
strategy: "anonymous"
external_services:
istio:
istio_injection_annotation: "istio.io/rev"
istiod_deployment_name: "istiod-asm-1-20"
config_map_name: "istio-asm-1-20"
istio_sidecar_injector_config_map_name: "istio-sidecar-injector-asm-1-20"
root_namespace: aks-istio-system
component_status:
enabled: true
components:
- app_label: istiod
is_core: true
- app_label: aks-istio-ingressgateway-internal
is_core: true
is_proxy: true
namespace: aks-istio-ingress
prometheus:
url: <private apim proxy URI we use>
# url: http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local:9090
tracing:
#url: http://jaeger-deployment-query.istio-system.svc.cluster.local:16686
in_cluster_url: http://jaeger-deploy-query.aks-istio-system.svc.cluster.local:16685/jaeger
use_grpc: true
sampling: 100
grafana:
enabled: false
deployment:
logger:
log_level: "debug"
accessible_namespaces:
- '**'
Hi @deuxailes , is there a second cluster "kubernetes" configured in your environment? From the logs it seems there is a traffic for a cluster "kubernetes" which is trying to show on the Graph but the "kubernetes" cluster is not accessible for Kiali.
There may still be a bug here. It looks like the service entry appender is trying to use the cluster from the node directly from: https://github.com/kiali/kiali/blob/a55c5c53bdf19d8fbe1e5a8f249d04d1d371c458/graph/telemetry/istio/appender/service_entry.go#L109-L115 and https://github.com/kiali/kiali/blob/a55c5c53bdf19d8fbe1e5a8f249d04d1d371c458/graph/telemetry/istio/appender/service_entry.go#L94-L100
I think most appenders check if the node is accessible first before trying to use the cluster right @jshaughn? It'd be good not to error out when a node's cluster from telemetry doesn't match what kiali is configured with but rather log a warning or something.
Right, there is a bug in appenders, also in Workload entries: https://github.com/kiali/kiali/blob/master/graph/telemetry/istio/appender/workload_entry.go#L44-L60
@hhovsepy @nrfox Thanks for the speedy replies.
To add more context, we utilize Azure Monitor Workspace as our hosted Prometheus server. We utilize Azure APIM to forward requests to the Query endpoint of the AMW with the proper /api/v1 path appended.
All of our environments send prometheus logs to the AMW instance.