helm-charts
helm-charts copied to clipboard
[loki-distributed] not ready: number of queriers connected to query-frontend is 0
Hello,
concerning issue https://github.com/grafana/helm-charts/issues/2028 i still got error below when queryFrontend.replicas is 2:
level=info ts=2023-03-27T09:12:12.813260784Z caller=module_service.go:82 msg=initialising module=cache-generation-loader
level=info ts=2023-03-27T09:12:12.813268147Z caller=module_service.go:82 msg=initialising module=server
level=info ts=2023-03-27T09:12:12.813276593Z caller=module_service.go:82 msg=initialising module=usage-report
level=info ts=2023-03-27T09:12:12.813280821Z caller=module_service.go:82 msg=initialising module=runtime-config
level=info ts=2023-03-27T09:12:12.813430544Z caller=module_service.go:82 msg=initialising module=query-frontend-tripperware
level=info ts=2023-03-27T09:12:12.813443759Z caller=module_service.go:82 msg=initialising module=query-frontend
level=info ts=2023-03-27T09:12:12.813500386Z caller=loki.go:461 msg="Loki started"
level=info ts=2023-03-27T09:12:42.279652042Z caller=frontend.go:342 msg="not ready: number of queriers connected to query-frontend is 0"
level=error ts=2023-03-27T09:12:46.708369563Z caller=reporter.go:203 msg="failed to delete corrupted cluster seed file, deleting it" err="BadRequest: Invalid token.\n\tstatus code: 400, request id: txbc43b8e6a0384bc79bcdb-0064215e0e, host id: txbc43b8e6a0384bc79bcdb-0064215e0e"
level=info ts=2023-03-27T09:12:52.279587151Z caller=frontend.go:342 msg="not ready: number of queriers connected to query-frontend is 0"
level=info ts=2023-03-27T09:13:02.280118627Z caller=frontend.go:342 msg="not ready: number of queriers connected to query-frontend is 0"
level=info ts=2023-03-27T09:13:12.280012774Z caller=frontend.go:342 msg="not ready: number of queriers connected to query-frontend is 0"
level=info ts=2023-03-27T09:13:22.280004704Z caller=frontend.go:342 msg="not ready: number of queriers connected to query-frontend is 0"
I checked and the headless service seems present:
$ kubectl -n logging-new get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loki-new-loki-distributed-compactor ClusterIP 10.43.20.194 <none> 3100/TCP 19m
loki-new-loki-distributed-distributor ClusterIP 10.43.167.185 <none> 3100/TCP,9095/TCP 18m
loki-new-loki-distributed-gateway ClusterIP 10.43.123.201 <none> 80/TCP 18m
loki-new-loki-distributed-ingester ClusterIP 10.43.53.24 <none> 3100/TCP,9095/TCP 19m
loki-new-loki-distributed-ingester-headless ClusterIP None <none> 3100/TCP,9095/TCP 19m
loki-new-loki-distributed-memberlist ClusterIP None <none> 7946/TCP 19m
loki-new-loki-distributed-querier ClusterIP 10.43.61.180 <none> 3100/TCP,9095/TCP 18m
loki-new-loki-distributed-querier-headless ClusterIP None <none> 3100/TCP,9095/TCP 19m
loki-new-loki-distributed-query-frontend ClusterIP 10.43.175.74 <none> 3100/TCP,9095/TCP,9096/TCP 18m
loki-new-loki-distributed-query-frontend-headless ClusterIP None <none> 3100/TCP,9095/TCP,9096/TCP 19m
Helm chart version is 0.69.9
Why i'm still got this?
Could it be 'caused by the spec below in values.yaml that should point the headless endpoint? https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L181
Thank you
in my case, When the querier starts, a connection is made to the query frontend.
However, since the querier accesses through the service of the query frontend, it seems that 1:1 mapping may not be possible if there are multiple query frontends.
So, if you increase the querier or decrease the query frontend, it seems that the querier and the query frontend are connected.
eg) query-frontend : "1" , querier : "3"
I guess it's not related to replicas ratio. I guess it should be related to this model: https://grafana.com/docs/loki/latest/configuration/query-frontend/#grpc-mode-pull-model
I'm also getting trouble with this.
It seems that publishNotReadyAddresses is missing from the querier headless Service. May this matter?
Same here, with 2 querier and 2 query-frontend pods. As @kladiv mentioned, I changed https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L181 to the headless-service and then it worked. Not sure if that's the solution though, or if side-effects can be expected.
In my case, it seems that the change to the headless service does not work normally. Querier has four replicas and query frontend has two replicas, each with an autoscaling option enabled, resulting in a crashloopbackoff (distributor / ingester / querier / queryFrontend).
I'm also facing this error.
If I disable the queryScheduler it works fine
We're seeing this as well..
Please check latest release. Frontend address was adjusted in loki-distributed-0.69.13: https://github.com/grafana/helm-charts/commit/3829417e0d113d24ea82ff9f0c6c631d20f95822
I no longer see this issue in helm-loki-5.2.0 chart.
We also deployed the 5.2.0 Helm chart to some of our environments today and the issue appears to be resolved. :+1:
I encountered the same issue in mimir-distributed helm chart and I resolved it by configuring frontend_worker.scheduler_address parameter. More info here: https://grafana.com/docs/mimir/latest/references/configuration-parameters/#frontend_worker
Using s3 all the way solved this for me. Using filesystem with loki-distributed did make some weird problems
I have the same error with grafana/loki (simple scalable) deployment Helm Chart version 5.2.0. It deploys 3 loki-read pods and only one gives that error, the other two is happy.
Edit: After restarting that failing pod, it becomes healthy.
I am having this same issue. My environment is running on istio with mutual tls enabled. If I disable mutual tls everything works fine.
❯ helm ls -n loki
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
loki loki 2 2024-03-21 15:57:58.90046894 -0400 EDT deployed loki-distributed-0.78.3 2.9.4
I am having same issue with helm chart "[email protected]"
I sometimes have this issue with loki-read Pods. The endlessley report something like:
not ready: number of schedulers this worker is connected to is 0
When I delete the pods, and new ones get created, the get up and running in no time.
I think that either, Pods should be implemented more self-healing or they should be restarted after some timeout.
Same problem here.
I use loki-distributed with version 0.80.5 and loki-query-frontend works without problem when it starts, but after a few days it starts to restart continuously showing the error not ready: number of queriers connected to query-frontend is 0 but no other config has changed.
If I delete the pod and a new one is created the error is solved until a few days pass and the scenario repeats again.
Anyone found any news? I also use istio, could it be related?
Thanks!
Also started encountering this in 0.80.5.
Scaled down to one query-frontend for now. Do frontends not have some form of memberlist configuration? How do the frontends load balance between each other?