libcluster icon indicating copy to clipboard operation
libcluster copied to clipboard

Node is Not connecting with Kubernetes.DNSSRV

Open mohandass-pat opened this issue 3 years ago • 2 comments

Hi,

I'm using Elixir.Cluster.Strategy.Kubernetes.DNSSRV for libcluster in AWS EKS where I enabled Istio also.

Here is my configuration:

strategy: Elixir.Cluster.Strategy.Kubernetes.DNSSRV, config: [ service: "settings-v3-service", namespace: "kandula-dev", application_name: "settings", polling_interval: 10_000 ], connect: {:net_kernel, :connect_node, []}, disconnect: {:erlang, :disconnect_node, []}, list_nodes: {:erlang, :nodes, [:connected]} ] I'm using stateful sets. If I do hostname -f cmd it gives like this: "settings-0.settings-v3-service.kandula-dev.svc.cluster.local" My node name structure is like this: "settings@settings-0.settings-v3-service.kandula-dev.svc.cluster.local"

If I use Node.connect I can able to connect.

But with libcluster it is not connecting. it throws the below error.

the log is throwing this error:

` 2020-11-26T10:00:02.191644371Z 10:00:02.191 [warn] [libcluster:kandula_settings] unable to connect to :"[email protected]"

`

If I do dig SRV settings-v3-service.kandula-dev.svc.cluster.local it is not returning my stateful set list.

dig srv command result:

` root@settings-0:/app# dig SRV settings-v3-service.kandula-dev.svc.cluster.local

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> SRV settings-v3-service.kandula-dev.svc.cluster.local ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50050 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 28b65516a3c20ebb (echoed) ;; QUESTION SECTION: ;settings-v3-service.kandula-dev.svc.cluster.local. IN SRV

;; ANSWER SECTION: settings-v3-service.kandula-dev.svc.cluster.local. 5 IN SRV 0 100 80 settings-v3-service.kandula-dev.svc.cluster.local.

;; ADDITIONAL SECTION: settings-v3-service.kandula-dev.svc.cluster.local. 5 IN A 10.100.3.101

;; Query time: 2 msec ;; SERVER: 10.100.0.10#53(10.100.0.10) ;; WHEN: Thu Nov 26 09:59:11 UTC 2020 ;; MSG SIZE rcvd: 273

root@settings-0:/app# . `

Thanks in advance.

mohandass-pat avatar Nov 26 '20 10:11 mohandass-pat

@mohandass-pat did you ever get anywhere with this ?

I am having same issue where no matter what strategy i am using alongside with istio on my cluster. My pods cannot seem to connect to one another

amacciola avatar Aug 24 '22 20:08 amacciola

I was facing a similar issue until I set the following environment variables for a mix release to modify the Erlang node options:

RELEASE_DISTRIBUTION=name
RELEASE_NODE=my-app
config :libcluster,
      topologies: [
        k8s_example: [
          strategy: Elixir.Cluster.Strategy.Kubernetes.DNSSRV,
          config: [
            ...
            application_name: "my-app",
            polling_interval: 10_000
          ]
        ]
      ]

I believe :application_name has to match the Erlang node name (-name or RELEASE_NODE value).

I believe the Elixir.Cluster.Strategy.Kubernetes.DNSSRV requires that we use -name or RELEASE_DISTRIBUTION=name so that the Node's host portion is fully qualified and looks something like this:

iex1> Node.self()
:"[email protected]"

If it is set to -sname or RELEASE_DISTRIBUTION=sname then the host portion of the node is not fully qualified (:"my-app@my-app-0") and will not work.

spencerdcarlson avatar Apr 29 '23 18:04 spencerdcarlson