loki
loki copied to clipboard
loki-simple-scalable loki-gateway Nginx startup failed
After installing loki simple scalable with help, the gateway log reports the following error:
kubectl logs -n loki loki-gateway-7f78b889f9-9tb75
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2022/09/29 09:55:35 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27
But there are kube dns in my cluster
root@master-01:# kubectl get -n kube-system svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 27d
root@master-01:# kubectl get -n kube-system pods
NAME READY STATUS RESTARTS AGE
coredns-c676cc86f-dxv8m 1/1 Running 0 150m
coredns-c676cc86f-l6njk 1/1 Running 0 151m
etcd-master-01 1/1 Running 65 27d
kube-apiserver-master-01 1/1 Running 0 27d
kube-controller-manager-master-01 1/1 Running 2 (24h ago) 27d
kube-proxy-c9746 1/1 Running 0 27d
kube-proxy-d8qbr 1/1 Running 0 24d
kube-proxy-jhl2s 1/1 Running 0 24d
kube-proxy-rw5tr 1/1 Running 0 24d
kube-scheduler-master-01 1/1 Running 71 (24d ago) 27d
I have the same issue and can see following in the logs:
[INFO] 10.0.0.72:34141 - 11273 "A IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000441375s [INFO] 10.0.0.72:34141 - 11776 "AAAA IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000412507s [INFO] 10.0.0.72:51078 - 27467 "AAAA IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000208542s [INFO] 10.0.0.72:51078 - 26859 "A IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000176965s [INFO] 10.0.1.120:60184 - 63770 "A IN loki.monitoring.svc.cluster.local.monitoring.svc.cluster.local. udp 80 false 512" NXDOMAIN qr,aa,rd 173 0.000274916s [INFO] 10.0.1.120:50301 - 19685 "AAAA IN loki.monitoring.svc.cluster.local.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000092976s [INFO] 10.0.1.120:59617 - 17088 "A IN loki.monitoring.svc.cluster.local.cluster.local. udp 65 false 512" NXDOMAIN qr,aa,rd 158 0.000166104s [INFO] 10.0.1.120:60339 - 24553 "AAAA IN loki.monitoring.svc.cluster.local.damn.li. udp 59 false 512" NOERROR qr,rd,ra 143 0.000685193s [INFO] 10.0.1.120:56636 - 10429 "AAAA IN loki.monitoring.svc.cluster.local. udp 51 false 512" NXDOMAIN qr,aa,rd 144 0.000104261s
For some reason gateway is requesting a way too long domain.
@darox @jwping You have to check that you configure loki with the right dns setting.
Query the name of your kube-dns service name,
kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'
then adjust your helm setting with the result you got, in my case the dns svc is not kube-dns but "rke2-coredns-rke2-coredns". so i use
global:
dnsService: "rke2-coredns-rke2-coredns"
and it works fine, pod start and does not complain anymore.
Could you try this again? I normally develop against a k3d cluster, but in testing against a kind cluster to debug some CI failures (since that's what our CI uses), I noticed some differences in the ndots value present in the /etc/resolv.conf in the containers on the kind cluster. As a result I needed to add an extra dot to the end of the resolver DNS record. That change should be in 3.3.0. Can you please try that version and let me know if this is still an issue?
In my case it's: kube-dns
same error
root@node52:~# kubectl -n loki logs -f loki-gateway-774ff559b9-2w4dq
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/01/05 08:41:13 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27
and dns
root@node52:~# kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'
coredns
and resloved by
global:
dnsService: "coredns"
I suspect this is related to the ndots configuration in the /etc/resolv.conf. May we see the resolver configuration please?
The solutin from seb-835 https://github.com/grafana/loki/issues/7287#issuecomment-1282339134 works for me
In my case the cluster dns was not resolving the cluster.local domain at all, the solution was to add also the clusterDomain. The installation was a k3s Cluster provisioned via Rancher 2.7.6 with Cluster Domain explicitly set.
Kubernetes Version: v1.25.13 +k3s1
Helm Chart:
- name: loki version: 5.15.0 repository: https://grafana.github.io/helm-charts
global:
dnsService: "kube-dns"
dnsNamespace: "kube-system"
clusterDomain: "mysubdomain.mydomain.it"
Could be a nice option to have the possibility to set in the helm chart the IP of the DNS svc instead of the fqdn?
kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'
kube-dns
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/12/21 22:00:31 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33
Encountered the same error when switching to Talos
from random container:
ping kube-dns.kube-system.svc.cluster.local.
PING kube-dns.kube-system.svc.cluster.local. (10.96.0.10): 56 data bytes
@batazor I got the same error when I run loki with gateway on Talos cluster. Have you found any solution?
IMHO may be related to https://github.com/grafana/loki/issues/11650
Same issue here, we have two GKE clusters and one is using DNS Kube-dns (loki works without any adjustments) and the second DNS is Cloud DNS (VPC scope) with specific Domain suffix.
As mentioned above we tried to change global.clusterDomain to Domain suffix and it works.
Getting the same error
The following config solve my probelm:
loki:
global:
dnsService: coredns
Hi guys, the following method works for me, please try it
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
calico-typha ClusterIP 10.0.253.98 <none> 5473/TCP 7d1h
kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP,9153/TCP 7d1h
metrics-server ClusterIP 10.0.187.168 <none> 443/TCP 7d
vim values.yaml
gateway:
resolver: "10.0.0.2"
We are using EKS Auto Mode which does not publish a Kube-DNS service endpoint. We are getting this issue to when starting up NGINX. https://docs.aws.amazon.com/eks/latest/userguide/automode.html#_features . It would be nice to allow no resolver and let the Node handle it as one of the possible paths via libsonnet.