loki loki-simple-scalable loki-gateway Nginx startup failed

trafficstars

After installing loki simple scalable with help, the gateway log reports the following error:

kubectl logs -n loki loki-gateway-7f78b889f9-9tb75
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2022/09/29 09:55:35 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27

But there are kube dns in my cluster

root@master-01:# kubectl get -n kube-system svc
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   27d

root@master-01:# kubectl get -n kube-system pods
NAME                                READY   STATUS    RESTARTS       AGE
coredns-c676cc86f-dxv8m             1/1     Running   0              150m
coredns-c676cc86f-l6njk             1/1     Running   0              151m
etcd-master-01                      1/1     Running   65             27d
kube-apiserver-master-01            1/1     Running   0              27d
kube-controller-manager-master-01   1/1     Running   2 (24h ago)    27d
kube-proxy-c9746                    1/1     Running   0              27d
kube-proxy-d8qbr                    1/1     Running   0              24d
kube-proxy-jhl2s                    1/1     Running   0              24d
kube-proxy-rw5tr                    1/1     Running   0              24d
kube-scheduler-master-01            1/1     Running   71 (24d ago)   27d

Sep 29 '22 10:09 jwping

I have the same issue and can see following in the logs:

[INFO] 10.0.0.72:34141 - 11273 "A IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000441375s [INFO] 10.0.0.72:34141 - 11776 "AAAA IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000412507s [INFO] 10.0.0.72:51078 - 27467 "AAAA IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000208542s [INFO] 10.0.0.72:51078 - 26859 "A IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000176965s [INFO] 10.0.1.120:60184 - 63770 "A IN loki.monitoring.svc.cluster.local.monitoring.svc.cluster.local. udp 80 false 512" NXDOMAIN qr,aa,rd 173 0.000274916s [INFO] 10.0.1.120:50301 - 19685 "AAAA IN loki.monitoring.svc.cluster.local.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000092976s [INFO] 10.0.1.120:59617 - 17088 "A IN loki.monitoring.svc.cluster.local.cluster.local. udp 65 false 512" NXDOMAIN qr,aa,rd 158 0.000166104s [INFO] 10.0.1.120:60339 - 24553 "AAAA IN loki.monitoring.svc.cluster.local.damn.li. udp 59 false 512" NOERROR qr,rd,ra 143 0.000685193s [INFO] 10.0.1.120:56636 - 10429 "AAAA IN loki.monitoring.svc.cluster.local. udp 51 false 512" NXDOMAIN qr,aa,rd 144 0.000104261s

For some reason gateway is requesting a way too long domain.

Oct 02 '22 13:10 darox

@darox @jwping You have to check that you configure loki with the right dns setting.

Query the name of your kube-dns service name,

kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'

then adjust your helm setting with the result you got, in my case the dns svc is not kube-dns but "rke2-coredns-rke2-coredns". so i use

global:
   dnsService: "rke2-coredns-rke2-coredns"

and it works fine, pod start and does not complain anymore.

Oct 18 '22 12:10 seb-835

Could you try this again? I normally develop against a k3d cluster, but in testing against a kind cluster to debug some CI failures (since that's what our CI uses), I noticed some differences in the ndots value present in the /etc/resolv.conf in the containers on the kind cluster. As a result I needed to add an extra dot to the end of the resolver DNS record. That change should be in 3.3.0. Can you please try that version and let me know if this is still an issue?

Oct 26 '22 20:10 trevorwhitney

In my case it's: kube-dns

Nov 04 '22 16:11 darox

same error

root@node52:~# kubectl -n loki logs -f loki-gateway-774ff559b9-2w4dq  
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/01/05 08:41:13 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27

and dns

root@node52:~# kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
coredns

and resloved by

global:
  dnsService: "coredns"

Jan 05 '23 08:01 weironz

I suspect this is related to the ndots configuration in the /etc/resolv.conf. May we see the resolver configuration please?

Mar 22 '23 16:03 zalegrala

The solutin from seb-835 https://github.com/grafana/loki/issues/7287#issuecomment-1282339134 works for me

Jun 18 '23 11:06 rezaebrahimi1

In my case the cluster dns was not resolving the cluster.local domain at all, the solution was to add also the clusterDomain. The installation was a k3s Cluster provisioned via Rancher 2.7.6 with Cluster Domain explicitly set.

Kubernetes Version: v1.25.13 +k3s1

Helm Chart:

name: loki version: 5.15.0 repository: https://grafana.github.io/helm-charts


global:
   dnsService: "kube-dns"
   dnsNamespace: "kube-system"
   clusterDomain: "mysubdomain.mydomain.it"

Could be a nice option to have the possibility to set in the helm chart the IP of the DNS svc instead of the fqdn?

Oct 10 '23 16:10 murand78

kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
kube-dns

/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/12/21 22:00:31 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33

Encountered the same error when switching to Talos

from random container:

ping kube-dns.kube-system.svc.cluster.local.
PING kube-dns.kube-system.svc.cluster.local. (10.96.0.10): 56 data bytes

Dec 21 '23 22:12 batazor

@batazor I got the same error when I run loki with gateway on Talos cluster. Have you found any solution?

Mar 24 '24 10:03 camaeel

IMHO may be related to https://github.com/grafana/loki/issues/11650

Mar 24 '24 10:03 camaeel

Same issue here, we have two GKE clusters and one is using DNS Kube-dns (loki works without any adjustments) and the second DNS is Cloud DNS (VPC scope) with specific Domain suffix.

As mentioned above we tried to change global.clusterDomain to Domain suffix and it works.

May 25 '24 08:05 artem-zherdiev-ingio

Getting the same error

Jul 02 '24 15:07 acar-ctpe

The following config solve my probelm:

loki:
  global:
    dnsService: coredns

Jul 03 '24 01:07 benjaminhuo

Hi guys, the following method works for me, please try it

NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
calico-typha     ClusterIP   10.0.253.98    <none>        5473/TCP                 7d1h
kube-dns         ClusterIP   10.0.0.2       <none>        53/UDP,53/TCP,9153/TCP   7d1h
metrics-server   ClusterIP   10.0.187.168   <none>        443/TCP                  7d

vim values.yaml
gateway:
  resolver: "10.0.0.2"

Nov 11 '24 09:11 rtnba

We are using EKS Auto Mode which does not publish a Kube-DNS service endpoint. We are getting this issue to when starting up NGINX. https://docs.aws.amazon.com/eks/latest/userguide/automode.html#_features . It would be nice to allow no resolver and let the Node handle it as one of the possible paths via libsonnet.

Jan 19 '25 23:01 whatnick

loki loki copied to clipboard

loki-simple-scalable loki-gateway Nginx startup failed

loki
loki copied to clipboard