loki
loki copied to clipboard
Loki unable to start on IPv6 EKS cluster
Describe the bug Loki unable to start on IPv6 EKS cluster
To Reproduce Steps to reproduce the behavior:
- Launch an EKS cluster with IPv6 -- my reference cluster was launched using https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples/ipv6-eks-cluster
- Deploy loki helm chart -- https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed
- View errors from distributor pods
Expected behavior Loki components all startup using available IPv6 addresses
Environment:
- AWS EKS
- Helm
Screenshots, Promtail config, or terminal output
interfaces available to the pods (from gateway):
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9001 qdisc noqueue state UP
link/ether 1a:db:76:5c:f2:9c brd ff:ff:ff:ff:ff:ff
inet6 2600:<snip>::3/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::18db:76ff:fe5c:f29c/64 scope link
valid_lft forever preferred_lft forever
5: v4if0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9001 qdisc noqueue state UP
link/ether 62:db:c6:a1:8a:1f brd ff:ff:ff:ff:ff:ff
inet 169.254.172.5/22 brd 169.254.175.255 scope global v4if0
valid_lft forever preferred_lft forever
with default helm values, no private ip address found
level=info ts=2022-05-25T18:14:29.578598129Z caller=main.go:106 msg="Starting Loki" version="(version=2.5.0, branch=HEAD, revision=2d9d0ee23)"
level=info ts=2022-05-25T18:14:29.578800373Z caller=server.go:260 http=:3100 grpc=:9095 msg="server listening on addresses"
level=info ts=2022-05-25T18:14:29.579202179Z caller=memberlist_client.go:394 msg="Using memberlist cluster node name" name=loki-loki-distributed-distributor-568d4894df-nltpf-ffe7492b
level=warn ts=2022-05-25T18:14:29.579562645Z caller=util.go:181 msg="error getting interface" inf=en0 err="route ip+net: no such network interface"
level=info ts=2022-05-25T18:14:29.581538822Z caller=module_service.go:64 msg=initialising module=usage-report
level=info ts=2022-05-25T18:14:29.581538939Z caller=module_service.go:64 msg=initialising module=server
level=info ts=2022-05-25T18:14:29.581565905Z caller=module_service.go:64 msg=initialising module=memberlist-kv
level=error ts=2022-05-25T18:14:29.581635265Z caller=loki.go:384 msg="module failed" module=memberlist-kv error="invalid service state: Failed, expected: Running, failure: service &{0xc00014b540 { true 10000000000 4 30000000000 200000000 3 30000000000 0 true 7946 [loki-loki-distributed-memberlist] 1000000000 60000000000 10 true 0 300000000000 5000000000 0 {[] 7946 5000000000 5000000000 false 0xc00013ea00 false { false}} 0xc00013ea00 [{ringDesc 0x1145460} {}]} 0xc0006191d0 0xc00013ea00 0xc0002f0a10 {{} [0 1 0]} <nil> <nil> {0 0} map[] map[ringDesc:{ringDesc 0x1145460} usagestats.jsonCodec:{}] {0 0} map[] map[] {0 0} [] 0 [] 0 0 0xc0000568a0 0xc000a60000 0xc000a600c0 0xc000a60180 0xc000a603c0 0xc000a60240 0xc000a60480 0xc000a60300 0xc00095ca00 0xc00095ca40 0xc000a60600 0xc000a606c0 0xc000a60840 0xc000a60780 0xc000290498 0xc0002f2000 0xc000290488 0xc000290490 0xc00095cac0 0xc00095cb00 10} failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided"
level=error ts=2022-05-25T18:14:29.581703559Z caller=loki.go:384 msg="module failed" module=distributor error="failed to start distributor, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: invalid service state: Failed, expected: Running, failure: service &{0xc00014b540 { true 10000000000 4 30000000000 200000000 3 30000000000 0 true 7946 [loki-loki-distributed-memberlist] 1000000000 60000000000 10 true 0 300000000000 5000000000 0 {[] 7946 5000000000 5000000000 false 0xc00013ea00 false { false}} 0xc00013ea00 [{ringDesc 0x1145460} {}]} 0xc0006191d0 0xc00013ea00 0xc0002f0a10 {{} [0 1 0]} <nil> <nil> {0 0} map[] map[ringDesc:{ringDesc 0x1145460} usagestats.jsonCodec:{}] {0 0} map[] map[] {0 0} [] 0 [] 0 0 0xc0000568a0 0xc000a60000 0xc000a600c0 0xc000a60180 0xc000a603c0 0xc000a60240 0xc000a60480 0xc000a60300 0xc00095ca00 0xc00095ca40 0xc000a60600 0xc000a606c0 0xc000a60840 0xc000a60780 0xc000290498 0xc0002f2000 0xc000290488 0xc000290490 0xc00095cac0 0xc00095cb00 10} failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided"
level=error ts=2022-05-25T18:14:29.581743076Z caller=loki.go:384 msg="module failed" module=ring error="failed to start ring, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: invalid service state: Failed, expected: Running, failure: service &{0xc00014b540 { true 10000000000 4 30000000000 200000000 330000000000 0 true 7946 [loki-loki-distributed-memberlist] 1000000000 60000000000 10 true 0 300000000000 5000000000 0 {[] 7946 5000000000 5000000000 false 0xc00013ea00 false { false}} 0xc00013ea00 [{ringDesc 0x1145460} {}]} 0xc0006191d0 0xc00013ea00 0xc0002f0a10 {{} [0 1 0]} <nil> <nil> {0 0} map[] map[ringDesc:{ringDesc 0x1145460} usagestats.jsonCodec:{}] {0 0} map[] map[] {0 0} [] 0 [] 0 0 0xc0000568a0 0xc000a60000 0xc000a600c0 0xc000a60180 0xc000a603c0 0xc000a60240 0xc000a60480 0xc000a60300 0xc00095ca00 0xc00095ca40 0xc000a60600 0xc000a606c0 0xc000a60840 0xc000a60780 0xc000290498 0xc0002f2000 0xc000290488 0xc000290490 0xc00095cac0 0xc00095cb00 10} failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided"
level=warn ts=2022-05-25T18:14:29.581794441Z caller=module_service.go:94 msg="module failed with error" module=usage-report err="context canceled"
level=error ts=2022-05-25T18:14:29.581818763Z caller=loki.go:384 msg="module failed" module=usage-report error="context canceled"
level=info ts=2022-05-25T18:14:29.583113817Z caller=modules.go:877 msg="server stopped"
level=info ts=2022-05-25T18:14:29.583128137Z caller=module_service.go:96 msg="module stopped" module=server
level=info ts=2022-05-25T18:14:29.583139305Z caller=loki.go:373 msg="Loki stopped"
level=error ts=2022-05-25T18:14:29.583177081Z caller=log.go:100 msg="error running loki" err="failed services\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:419\nmain.main\n\t/src/loki/cmd/loki/main.go:108\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
attemping to use v4if0:
level=info ts=2022-05-25T18:21:53.575979324Z caller=main.go:106 msg="Starting Loki" version="(version=2.5.0, branch=HEAD, revision=2d9d0ee23)"
level=info ts=2022-05-25T18:21:53.576227957Z caller=server.go:260 http=:3100 grpc=:9095 msg="server listening on addresses"
level=info ts=2022-05-25T18:21:53.5768685Z caller=memberlist_client.go:394 msg="Using memberlist cluster node name" name=loki-loki-distributed-distributor-764ff878f6-5hgnm-4ff68506
level=warn ts=2022-05-25T18:21:53.576897266Z caller=util.go:205 msg="using automatic private ip" address=169.254.172.7
level=info ts=2022-05-25T18:21:53.579043014Z caller=module_service.go:64 msg=initialising module=server
level=info ts=2022-05-25T18:21:53.579050624Z caller=module_service.go:64 msg=initialising module=memberlist-kv
level=error ts=2022-05-25T18:21:53.579123466Z caller=loki.go:384 msg="module failed" module=memberlist-kv error="invalid service state: Stopping, expected: Running"
level=info ts=2022-05-25T18:21:53.579151709Z caller=module_service.go:64 msg=initialising module=usage-report
level=error ts=2022-05-25T18:21:53.579166709Z caller=loki.go:384 msg="module failed" module=ring error="failed to start ring, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: invalid service state: Stopping, expected: Running"
level=error ts=2022-05-25T18:21:53.579182682Z caller=loki.go:384 msg="module failed" module=distributor error="failed to start distributor, because it depends on module ring, which has failed: context canceled"
level=error ts=2022-05-25T18:21:53.579195744Z caller=loki.go:384 msg="module failed" module=usage-report error="context canceled"
level=info ts=2022-05-25T18:21:53.580294456Z caller=modules.go:877 msg="server stopped"
level=info ts=2022-05-25T18:21:53.580309591Z caller=module_service.go:96 msg="module stopped" module=server
level=info ts=2022-05-25T18:21:53.580317022Z caller=loki.go:373 msg="Loki stopped"
level=error ts=2022-05-25T18:21:53.580348485Z caller=log.go:100 msg="error running loki" err="failed services\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:419\nmain.main\n\t/src/loki/cmd/loki/main.go:108\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
Any progress here? I have met the same problem. :)
Hi @paulroche, it seems that I just find a solution to get over this in https://github.com/grafana/helm-charts/issues/157#issuecomment-919369654.
Hope it can help you too.
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
- Mark issues as
revivable
if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalive
label to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
The main issue for this is currently the limitation of DSKit's Netutil to handle IPv6 properly. I am linking the PR here:
- https://github.com/grafana/dskit/pull/185
Btw I can confirm this error is not EKS specific, it happens also on OpenShift with OVN using IPv6 networking.
any idea if these issue is resolved
https://github.com/grafana/dskit/pull/185 could use another test pass, but I doubt I'll have the time in the next month. Specifically, I want to know more about the DNS handling when results come back in v4 and v6 format, is the correct address chosen.
grafana/dskit#185 could use another test pass, but I doubt I'll have the time in the next month. Specifically, I want to know more about the DNS handling when results come back in v4 and v6 format, is the correct address chosen.
I can take on testing this on OpenShift where we have dual stack support too. I believe we can test this in the dskit PR with a unit test too.
Related is https://github.com/grafana/tempo/issues/1544, and the configuration I had to tune to get IPv6 to work properly in Tempo https://github.com/zalegrala/tempo/blob/inet6/operations/jsonnet/microservices/common.libsonnet#L17 I'm zleslie
in the community slack if you need a hand understanding that jsonnet.
Hello,
There are new information on this subject ?
I tried to deploy Loki with loki-distributed Helm chart on my EKS Cluster on IPV4 and all is good. I tried to deploy the same thing on my EKS Cluster in IPV6 and the stack did not start correctly.
I know that there had development from @zalegrala to make compatible Tempo in IPV6 (https://github.com/grafana/dskit/pull/185) and it's works with the tests of @gillg.
Can you do the same thing for Loki ?
Thank you guys.
I can confirm this values makes Loki work completely fine in ipv6-only k8s https://github.com/grafana/loki/issues/5578#issuecomment-1474249365
seconding @mossad-zika comment here, but I summarized the IPV6-related configs:
loki:
config: |
common:
instance_addr: "${MY_POD_IP}"
ring:
kvstore:
store: memberlist
instance_addr: "${MY_POD_IP}"
memberlist:
bind_addr:
- ${MY_POD_IP}
write:
extraArgs:
- -config.expand-env=true
extraEnv:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
read:
extraArgs:
- -config.expand-env=true
extraEnv:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
backend:
extraArgs:
- -config.expand-env=true
extraEnv:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
singleBinary:
extraArgs:
- -config.expand-env=true
extraEnv:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
Also, if you are using terraform HCL you'll need to escape the "${MY_POD_IP}"
as "$$${MY_POD_IP}"
Hi @isaccavalcante , how does it works with monolithics loki on Kubernetes ?
Also can you share the config for updated version?