loki icon indicating copy to clipboard operation
loki copied to clipboard

loki-simple-scalable loki-gateway Nginx startup failed

Open jwping opened this issue 3 years ago • 12 comments
trafficstars

After installing loki simple scalable with help, the gateway log reports the following error:

kubectl logs -n loki loki-gateway-7f78b889f9-9tb75
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2022/09/29 09:55:35 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27

But there are kube dns in my cluster

root@master-01:# kubectl get -n kube-system svc
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   27d
root@master-01:# kubectl get -n kube-system pods
NAME                                READY   STATUS    RESTARTS       AGE
coredns-c676cc86f-dxv8m             1/1     Running   0              150m
coredns-c676cc86f-l6njk             1/1     Running   0              151m
etcd-master-01                      1/1     Running   65             27d
kube-apiserver-master-01            1/1     Running   0              27d
kube-controller-manager-master-01   1/1     Running   2 (24h ago)    27d
kube-proxy-c9746                    1/1     Running   0              27d
kube-proxy-d8qbr                    1/1     Running   0              24d
kube-proxy-jhl2s                    1/1     Running   0              24d
kube-proxy-rw5tr                    1/1     Running   0              24d
kube-scheduler-master-01            1/1     Running   71 (24d ago)   27d

jwping avatar Sep 29 '22 10:09 jwping

I have the same issue and can see following in the logs:

[INFO] 10.0.0.72:34141 - 11273 "A IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000441375s [INFO] 10.0.0.72:34141 - 11776 "AAAA IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000412507s [INFO] 10.0.0.72:51078 - 27467 "AAAA IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000208542s [INFO] 10.0.0.72:51078 - 26859 "A IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000176965s [INFO] 10.0.1.120:60184 - 63770 "A IN loki.monitoring.svc.cluster.local.monitoring.svc.cluster.local. udp 80 false 512" NXDOMAIN qr,aa,rd 173 0.000274916s [INFO] 10.0.1.120:50301 - 19685 "AAAA IN loki.monitoring.svc.cluster.local.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000092976s [INFO] 10.0.1.120:59617 - 17088 "A IN loki.monitoring.svc.cluster.local.cluster.local. udp 65 false 512" NXDOMAIN qr,aa,rd 158 0.000166104s [INFO] 10.0.1.120:60339 - 24553 "AAAA IN loki.monitoring.svc.cluster.local.damn.li. udp 59 false 512" NOERROR qr,rd,ra 143 0.000685193s [INFO] 10.0.1.120:56636 - 10429 "AAAA IN loki.monitoring.svc.cluster.local. udp 51 false 512" NXDOMAIN qr,aa,rd 144 0.000104261s

For some reason gateway is requesting a way too long domain.

darox avatar Oct 02 '22 13:10 darox

@darox @jwping You have to check that you configure loki with the right dns setting.

Query the name of your kube-dns service name,

kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'

then adjust your helm setting with the result you got, in my case the dns svc is not kube-dns but "rke2-coredns-rke2-coredns". so i use

global:
   dnsService: "rke2-coredns-rke2-coredns"

and it works fine, pod start and does not complain anymore.

seb-835 avatar Oct 18 '22 12:10 seb-835

Could you try this again? I normally develop against a k3d cluster, but in testing against a kind cluster to debug some CI failures (since that's what our CI uses), I noticed some differences in the ndots value present in the /etc/resolv.conf in the containers on the kind cluster. As a result I needed to add an extra dot to the end of the resolver DNS record. That change should be in 3.3.0. Can you please try that version and let me know if this is still an issue?

trevorwhitney avatar Oct 26 '22 20:10 trevorwhitney

In my case it's: kube-dns

darox avatar Nov 04 '22 16:11 darox

same error

root@node52:~# kubectl -n loki logs -f loki-gateway-774ff559b9-2w4dq  
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/01/05 08:41:13 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27

and dns

root@node52:~# kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
coredns

and resloved by

global:
  dnsService: "coredns"

weironz avatar Jan 05 '23 08:01 weironz

I suspect this is related to the ndots configuration in the /etc/resolv.conf. May we see the resolver configuration please?

zalegrala avatar Mar 22 '23 16:03 zalegrala

The solutin from seb-835 https://github.com/grafana/loki/issues/7287#issuecomment-1282339134 works for me

rezaebrahimi1 avatar Jun 18 '23 11:06 rezaebrahimi1

In my case the cluster dns was not resolving the cluster.local domain at all, the solution was to add also the clusterDomain. The installation was a k3s Cluster provisioned via Rancher 2.7.6 with Cluster Domain explicitly set.

Kubernetes Version: v1.25.13 +k3s1

Helm Chart:

  • name: loki version: 5.15.0 repository: https://grafana.github.io/helm-charts

global:
   dnsService: "kube-dns"
   dnsNamespace: "kube-system"
   clusterDomain: "mysubdomain.mydomain.it"

Could be a nice option to have the possibility to set in the helm chart the IP of the DNS svc instead of the fqdn?

murand78 avatar Oct 10 '23 16:10 murand78

kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
kube-dns
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/12/21 22:00:31 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33

Encountered the same error when switching to Talos

from random container:

ping kube-dns.kube-system.svc.cluster.local.
PING kube-dns.kube-system.svc.cluster.local. (10.96.0.10): 56 data bytes

batazor avatar Dec 21 '23 22:12 batazor

@batazor I got the same error when I run loki with gateway on Talos cluster. Have you found any solution?

camaeel avatar Mar 24 '24 10:03 camaeel

IMHO may be related to https://github.com/grafana/loki/issues/11650

camaeel avatar Mar 24 '24 10:03 camaeel

Same issue here, we have two GKE clusters and one is using DNS Kube-dns (loki works without any adjustments) and the second DNS is Cloud DNS (VPC scope) with specific Domain suffix.

As mentioned above we tried to change global.clusterDomain to Domain suffix and it works.

artem-zherdiev-ingio avatar May 25 '24 08:05 artem-zherdiev-ingio

Getting the same error

acar-ctpe avatar Jul 02 '24 15:07 acar-ctpe

The following config solve my probelm:

loki:
  global:
    dnsService: coredns

benjaminhuo avatar Jul 03 '24 01:07 benjaminhuo

Hi guys, the following method works for me, please try it

NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
calico-typha     ClusterIP   10.0.253.98    <none>        5473/TCP                 7d1h
kube-dns         ClusterIP   10.0.0.2       <none>        53/UDP,53/TCP,9153/TCP   7d1h
metrics-server   ClusterIP   10.0.187.168   <none>        443/TCP                  7d

vim values.yaml
gateway:
  resolver: "10.0.0.2"

rtnba avatar Nov 11 '24 09:11 rtnba

We are using EKS Auto Mode which does not publish a Kube-DNS service endpoint. We are getting this issue to when starting up NGINX. https://docs.aws.amazon.com/eks/latest/userguide/automode.html#_features . It would be nice to allow no resolver and let the Node handle it as one of the possible paths via libsonnet.

whatnick avatar Jan 19 '25 23:01 whatnick