emqx-docker icon indicating copy to clipboard operation
emqx-docker copied to clipboard

emqtt pings says unreachable node. later it starts fine

Open rejji opened this issue 7 years ago • 16 comments

BUG REPORT

Expected behavior

EMQ should start after running docker container container

Actual behavior

EMQ does not start some of the times (say 5% of times) and start other times.

Test code

docker run -it -d -p 18083:18083 -p 8883:8883 -v /data1/ucp-emqtt-logs:/opt/emqttd/log -v /ucp/ssl/emqtt/ca.cert.pem:/opt/emqttd/etc/certs/cacert.pem -v /ucp/ssl/emqtt/emqtt.key:/opt/emqttd/etc/certs/key.pem -v /ucp/ssl/emqtt/emqtt.cert.pem:/opt/emqttd/etc/certs/cert.pem --name=ucp-dataplane-emqttserver <artifactory>/emqttd-docker-v2.1.2

EMQ version

OS: VMware Photon Linux 1.0, PHOTON_BUILD_NUMBER=13c08b6 Erlang/OTP: R19/8.1 EMQ: 2.1.2 (using docker image from http://emqtt.io/downloads)

Docker version

docker -v
Docker version 1.12.1, build 23cf638

How docker info?

docker info

Containers: 7
 Running: 7
 Paused: 0
 Stopped: 0
Images: 17
Server Version: 1.12.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.26
Operating System: VMware Photon/Linux
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859 GiB
Name: <name_removed>
ID: S4P5:<id_removed>:TTSZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

System

Photon OS from VMware uname -a: Linux sc-rdops-vm05-dhcp-130-190.eng.vmware.com 4.4.26 #1-photon SMP Mon Oct 24 19:43:03 UTC 2016 x86_64 GNU/Linux

Hardware

Virtual machine shared across different users on different orgs. 2 vCPU, 4GB RAM

Context

This issue was earlier reported at https://github.com/emqtt/emqttd/issues/1195

Description

Looks like docker node was found unreachable after erlexec command was launched which resulted into lost pings. This is not very consistent but has been reported off and on. Could you shed some more idea into it. Our initial debugging tells that when we start docker container, it takes a tiny_time to acquire container-ip and emqtt fails for that reason - so if we use a mechanism to delay start.sh by this tiny_time (till the container ip has been acquired), we can be good - is that correct assessment and is there a command line option to make it happen or we need to modify emqtt scripts. We can be wrong, so pls suggest your inputs.

After failure is reported, developer logs in for root cause analysis (after an hour or so, sometime 2 days). It does not crib and 'docker start' peacefully starts the container.

Logs

docker logs tells this:

[case 1]

mqtt.listener.tcp.max_clients=1000000 node.max_ets_tables=2097152 node.process_limit=2097152 node.name=@172.17.0.6 mqtt.listener.http.max_clients=250000 mqtt.listener.ssl.max_clients=500000 mqtt.listener.tcp.acceptors=64 node.max_ports=1048576 mqtt.listener.http.acceptors=16 mqtt.listener.ssl.acceptors=32 log.console=console Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-04T08:52:22Z']:waiting emqttd Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-04T08:52:23Z']:waiting emqttd Node '<removed_id>@172.17.0.8' not responding to pings. Exec: /opt/emqttd/erts-8.1/bin/erlexec -noshell -noinput +Bd -boot /opt/emqttd/releases/2.1.2/emqttd -mode embedded -boot_var ERTS_LIB_DIR /opt/emqttd/erts-8.1/../lib -mnesia dir "/opt/emqttd/data/mnesia/<removed_id>@172.17.0.8" -config /opt/emqttd/data/configs/app.2017.08.04.08.52.23.config -args_file /opt/emqttd/data/configs/vm.2017.08.04.08.52.23.args -vm_args /opt/emqttd/data/configs/vm.2017.08.04.08.52.23.args -- foreground Root: /opt/emqttd ['2017-08-04T08:52:25Z']:waiting emqttd Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-04T08:52:27Z']:waiting emqttd Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-04T08:52:30Z']:waiting emqttd Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-04T08:52:32Z']:waiting emqttd ['2017-08-04T08:52:32Z']:timeout error


[case 2]

mqtt.listener.tcp.max_clients=1000000 node.max_ets_tables=2097152 node.process_limit=2097152 [email protected] mqtt.listener.http.max_clients=250000 mqtt.listener.ssl.max_clients=500000 mqtt.listener.tcp.acceptors=64 node.max_ports=1048576 mqtt.listener.http.acceptors=16 mqtt.listener.ssl.acceptors=32 log.console=console Node '<removed_id>@172.17.0.6' not responding to pings. ['2017-08-04T10:33:40Z']:waiting emqttd Node '<removed_id>@172.17.0.6' not responding to pings. ['2017-08-04T10:33:41Z']:waiting emqttd Node '<removed_id>@172.17.0.6' not responding to pings. Exec: /opt/emqttd/erts-8.1/bin/erlexec -noshell -noinput +Bd -boot /opt/emqttd/releases/2.1.2/emqttd -mode embedded -boot_var ERTS_LIB_DIR /opt/emqttd/erts-8.1/../lib -mnesia dir "/opt/emqttd/data/mnesia/<removed_id>@172.17.0.6" -config /opt/emqttd/data/configs/app.2017.08.04.10.33.40.config -args_file /opt/emqttd/data/configs/vm.2017.08.04.10.33.40.args -vm_args /opt/emqttd/data/configs/vm.2017.08.04.10.33.40.args -- foreground Root: /opt/emqttd ['2017-08-04T10:33:43Z']:waiting emqttd Node '<removed_id>@172.17.0.6' not responding to pings. ['2017-08-04T10:33:45Z']:waiting emqttd Node '<removed_id>@172.17.0.6' not responding to pings. ['2017-08-04T10:33:46Z']:waiting emqttd Node '<removed_id>@172.17.0.6' not responding to pings. ['2017-08-04T10:33:48Z']:waiting emqttd ['2017-08-04T10:33:48Z']:timeout error


[case 3]

2017-08-09T07:19:55.090416065Z mqtt.listener.tcp.max_clients=1000000 2017-08-09T07:19:55.119077056Z node.max_ets_tables=2097152 2017-08-09T07:19:55.140541438Z node.process_limit=2097152 2017-08-09T07:19:55.161802897Z node.name=<removed_id>@172.17.0.6 2017-08-09T07:19:55.172058122Z mqtt.listener.http.max_clients=250000 2017-08-09T07:19:55.180687513Z mqtt.listener.ssl.max_clients=500000 2017-08-09T07:19:55.187679711Z mqtt.listener.tcp.acceptors=64 2017-08-09T07:19:55.209356292Z node.max_ports=1048576 2017-08-09T07:19:55.223772112Z mqtt.listener.http.acceptors=16 2017-08-09T07:19:55.234806815Z mqtt.listener.ssl.acceptors=32 2017-08-09T07:19:55.246186517Z log.console=console 2017-08-09T07:19:58.398976947Z Node '<removed_id>@172.17.0.6' not responding to pings. 2017-08-09T07:19:59.450973271Z ['2017-08-09T07:19:59Z']:waiting emqttd 2017-08-09T07:20:00.588396270Z Node '[email protected]' not responding to pings. 2017-08-09T07:20:01.851912495Z ['2017-08-09T07:20:01Z']:waiting emqttd 2017-08-09T07:20:03.755740693Z Node '[email protected]' not responding to pings. 2017-08-09T07:20:03.957874564Z Exec: /opt/emqttd/erts-8.1/bin/erlexec -noshell -noinput +Bd -boot /opt/emqttd/releases/2.1.2/emqttd -mode embedded -boot_var ERTS_LIB_DIR /opt/emqttd/erts-8.1/../lib -mnesia dir "/opt/emqttd/data/mnesia/<removed_id>@172.17.0.6" -config /opt/emqttd/data/configs/app.2017.08.09.07.20.00.config -args_file /opt/emqttd/data/configs/vm.2017.08.09.07.20.00.args -vm_args /opt/emqttd/data/configs/vm.2017.08.09.07.20.00.args -- foreground 2017-08-09T07:20:03.957905985Z Root: /opt/emqttd 2017-08-09T07:20:04.842647277Z ['2017-08-09T07:20:04Z']:waiting emqttd 2017-08-09T07:20:06.315019617Z Node '[email protected]' not responding to pings. 2017-08-09T07:20:06.364415004Z 2017-08-09T07:20:06.364436531Z =INFO REPORT==== 9-Aug-2017::07:20:06 === 2017-08-09T07:20:06.364440175Z alarm_handler: {set,{system_memory_high_watermark,[]}} 2017-08-09T07:20:06.424014306Z starting emqttd on node '[email protected]' 2017-08-09T07:20:07.331717579Z ['2017-08-09T07:20:07Z']:waiting emqttd 2017-08-09T07:20:09.209180213Z ['2017-08-09T07:20:09Z']:waiting emqttd 2017-08-09T07:20:11.266640976Z ['2017-08-09T07:20:11Z']:waiting emqttd 2017-08-09T07:20:11.404889009Z ['2017-08-09T07:20:11Z']:timeout error 2017-08-09T09:41:33.129061929Z mqtt.listener.tcp.max_clients=1000000 2017-08-09T09:41:33.150939857Z node.max_ets_tables=2097152 2017-08-09T09:41:33.158646914Z node.process_limit=2097152 2017-08-09T09:41:33.167425159Z node.name=<removed_id>@172.17.0.8 2017-08-09T09:41:33.173971699Z mqtt.listener.http.max_clients=250000 2017-08-09T09:41:33.180952712Z mqtt.listener.ssl.max_clients=500000 2017-08-09T09:41:33.189514317Z mqtt.listener.tcp.acceptors=64 2017-08-09T09:41:33.208717553Z node.max_ports=1048576 2017-08-09T09:41:33.220267545Z mqtt.listener.http.acceptors=16 2017-08-09T09:41:33.228830010Z mqtt.listener.ssl.acceptors=32 2017-08-09T09:41:33.235712598Z log.console=console 2017-08-09T09:41:34.283814733Z Node '<removed_id>@172.17.0.8' not responding to pings. 2017-08-09T09:41:35.294268604Z ['2017-08-09T09:41:35Z']:waiting emqttd 2017-08-09T09:41:35.559936803Z Exec: /opt/emqttd/erts-8.1/bin/erlexec -noshell -noinput +Bd -boot /opt/emqttd/releases/2.1.2/emqttd -mode embedded -boot_var ERTS_LIB_DIR /opt/emqttd/erts-8.1/../lib -mnesia dir "/opt/emqttd/data/mnesia/<removed_id>@172.17.0.8" -config /opt/emqttd/data/configs/app.2017.08.09.09.41.34.config -args_file /opt/emqttd/data/configs/vm.2017.08.09.09.41.34.args -vm_args /opt/emqttd/data/configs/vm.2017.08.09.09.41.34.args -- foreground 2017-08-09T09:41:35.559963047Z Root: /opt/emqttd 2017-08-09T09:41:36.059586971Z Node '<removed_id>@172.17.0.8' not responding to pings. 2017-08-09T09:41:36.514041144Z starting emqttd on node '<removed_id>@172.17.0.8' 2017-08-09T09:41:36.638138975Z emqttd ctl is starting...[ok] 2017-08-09T09:41:36.638443350Z emqttd hook is starting...[ok] 2017-08-09T09:41:36.638764228Z emqttd router is starting...[ok] 2017-08-09T09:41:36.642051899Z emqttd pubsub is starting...[ok] 2017-08-09T09:41:36.642479183Z emqttd stats is starting...[ok] 2017-08-09T09:41:36.643020844Z emqttd metrics is starting...[ok] 2017-08-09T09:41:36.643723552Z emqttd pooler is starting...[ok] 2017-08-09T09:41:36.644169643Z emqttd trace is starting...[ok] 2017-08-09T09:41:36.645111268Z emqttd client manager is starting...[ok] 2017-08-09T09:41:36.646040380Z emqttd session manager is starting...[ok] 2017-08-09T09:41:36.646420283Z emqttd session supervisor is starting...[ok] 2017-08-09T09:41:36.646847570Z emqttd wsclient supervisor is starting...[ok] 2017-08-09T09:41:36.647368408Z emqttd broker is starting...[ok] 2017-08-09T09:41:36.647648299Z emqttd alarm is starting...[ok] 2017-08-09T09:41:36.647771379Z emqttd mod supervisor is starting...[ok] 2017-08-09T09:41:36.648003335Z emqttd bridge supervisor is starting...[ok] 2017-08-09T09:41:36.648319268Z emqttd access control is starting...[ok] 2017-08-09T09:41:36.648725912Z emqttd system monitor is starting...[ok] 2017-08-09T09:41:36.655021164Z Load emq_mod_presence module successfully. 2017-08-09T09:41:36.655146218Z Load emq_mod_subscription module successfully. 2017-08-09T09:41:36.686422603Z dashboard:http listen on 0.0.0.0:18083 with 2 acceptors. 2017-08-09T09:41:36.688935583Z mqtt:tcp listen on 0.0.0.0:1883 with 64 acceptors. 2017-08-09T09:41:36.690445904Z mqtt:ssl listen on 0.0.0.0:8883 with 32 acceptors. 2017-08-09T09:41:36.691772449Z mqtt:ws listen on 0.0.0.0:8083 with 16 acceptors. 2017-08-09T09:41:36.692911549Z mqtt:wss listen on 0.0.0.0:8084 with 4 acceptors. 2017-08-09T09:41:36.692922387Z emqttd 2.1.2 is running now


[for successful attempts]

mqtt.listener.http.max_clients=250000 mqtt.listener.ssl.max_clients=500000 mqtt.listener.tcp.acceptors=64 node.max_ports=1048576 mqtt.listener.http.acceptors=16 mqtt.listener.ssl.acceptors=32 log.console=console Node '<removed_id>@172.17.0.8' not responding to pings. ['2017-08-03T10:47:57Z']:waiting emqttd Exec: /opt/emqttd/erts-8.1/bin/erlexec -noshell -noinput +Bd -boot /opt/emqttd/releases/2.1.2/emqttd -mode embedded -boot_var ERTS_LIB_DIR /opt/emqttd/erts-8.1/../lib -mnesia dir "/opt/emqttd/data/mnesia/<removed_id>@172.17.0.8" -config /opt/emqttd/data/configs/app.2017.08.03.10.47.56.config -args_file /opt/emqttd/data/configs/vm.2017.08.03.10.47.56.args -vm_args /opt/emqttd/data/configs/vm.2017.08.03.10.47.56.args -- foreground Root: /opt/emqttd Node '<removed_id>@172.17.0.8' not responding to pings. starting emqttd on node '<removed_id>@172.17.0.8' emqttd ctl is starting...[ok] emqttd hook is starting...[ok] emqttd router is starting...[ok] emqttd pubsub is starting...[ok] emqttd stats is starting...[ok] emqttd metrics is starting...[ok] emqttd pooler is starting...[ok] emqttd trace is starting...[ok] emqttd client manager is starting...[ok] emqttd session manager is starting...[ok] emqttd session supervisor is starting...[ok] emqttd wsclient supervisor is starting...[ok] emqttd broker is starting...[ok] emqttd alarm is starting...[ok] emqttd mod supervisor is starting...[ok] emqttd bridge supervisor is starting...[ok] emqttd access control is starting...[ok] emqttd system monitor is starting...[ok] Load emq_mod_presence module successfully. Load emq_mod_subscription module successfully. dashboard:http listen on 0.0.0.0:18083 with 2 acceptors. mqtt:tcp listen on 0.0.0.0:1883 with 64 acceptors. mqtt:ssl listen on 0.0.0.0:8883 with 4 acceptors. mqtt:ws listen on 0.0.0.0:8083 with 16 acceptors. mqtt:wss listen on 0.0.0.0:8084 with 4 acceptors. emqttd 2.1.2 is running now ['2017-08-03T10:47:59Z']:waiting emqttd ['2017-08-03T10:47:59Z']:emqttd start

rejji avatar Aug 31 '17 15:08 rejji

FWIW, this seems to happen all the time when trying to deploy using Nomad. When I tried running the Docker container locally it worked great. It would, a couple times:

Node '[email protected]' not responding to pings.
['2018-01-16T19:17:20Z']:waiting emqttd
Node '[email protected]' not responding to pings.

but then it would recover, execute erlexec and everything was happy. As soon as I tried deploying with Nomad, this happened: emqttd.stdout.txt It keeps trying to redeploy, but never works. I tried explicitly stopping the service and redeploying it but got the same result. What's especially confusing is that it's trying to ping 169.254.0.X but the IP address it should be running is 172.17... I don't even know how it would be getting a 169.254.0.X address. I have no idea what about the way Nomad deploys Docker images would cause this to happen all the time but I thought I'd mention it in case it helps.

UPDATE: I set EMQ_HOST using interpolation so it's now pinging the 172.17.X.X address, but still getting the same error.

Could it be related to http://erlang.org/pipermail/erlang-questions/2016-November/090774.html ?

MadDataScience avatar Jan 16 '18 18:01 MadDataScience

Another update: someone suggested increasing the EMQ_WAIT_TIME. It looks like the comment was removed, so maybe the author already realized that wouldn't help, but I thought I'd respond anyway, since I had had the same thought. I set the EMQ_WAIT_TIME to 60 and all it did was increase the number of times I got that "not responding to pings" error before the whole thing started over.

MadDataScience avatar Jan 17 '18 17:01 MadDataScience

I get the same error, and my EMQ version is 2.3.3 (using docker image from http://emqtt.io/downloads). I just set some envs:

EMQ_CLUSTER__DISCOVERY=etcd
EMQ_CLUSTER__AUTOHEAL=on
EMQ_CLUSTER__AUTOCLEAN=1m
EMQ_CLUSTER__ETCD__SERVER = http:\/\/etcd:2379
EMQ_CLUSTER__ETCD__PREFIX=emqcl
EMQ_CLUSTER__ETCD__NODE_TTL=1m
EMQ_WAIT_TIME=30
EMQ_LOADED_PLUGINS=emq_recon,emq_modules,emq_retainer

And want i got was that:

1/23/2018 11:56:18 AMlistener.ssl.external.acceptors=32
1/23/2018 11:56:18 AMnode.max_ets_tables=2097152
1/23/2018 11:56:18 AMnode.process_limit=2097152
1/23/2018 11:56:18 AMlistener.ws.external.acceptors=16
1/23/2018 11:56:18 AMcluster.discovery=etcd
1/23/2018 11:56:18 AMcluster.discovery=etcd
1/23/2018 11:56:18 [email protected]
1/23/2018 11:56:18 AMlistener.tcp.external.max_clients=1000000
1/23/2018 11:56:18 AMcluster.etcd.prefix=emqcl
1/23/2018 11:56:18 AMcluster.etcd.prefix=emqcl
1/23/2018 11:56:18 AMcluster.autoclean=1m
1/23/2018 11:56:18 AMcluster.autoclean=1m
1/23/2018 11:56:18 AMcluster.autoheal=on
1/23/2018 11:56:18 AMcluster.autoheal=on
1/23/2018 11:56:18 AMlistener.ssl.external.max_clients=500000
1/23/2018 11:56:18 AMnode.max_ports=1048576
1/23/2018 11:56:18 AMcluster.etcd.server=http:\/\/etcd:2379
1/23/2018 11:56:18 AMcluster.etcd.server=http:\/\/etcd:2379
1/23/2018 11:56:18 AMlistener.tcp.external.acceptors=64
1/23/2018 11:56:18 AMlog.console=console
1/23/2018 11:56:18 AMlistener.ws.external.max_clients=250000
1/23/2018 11:56:18 AMcluster.etcd.node_ttl=1m
1/23/2018 11:56:18 AMcluster.etcd.node_ttl=1m
1/23/2018 11:56:18 AMEMQ_LOADED_PLUGINS=emq_recon,emq_modules,emq_retainer
1/23/2018 11:56:19 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:20 AM['2018-01-23T03:56:20Z']:waiting emqttd
1/23/2018 11:56:20 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:21 AM['2018-01-23T03:56:21Z']:waiting emqttd
1/23/2018 11:56:22 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:23 AM['2018-01-23T03:56:23Z']:waiting emqttd
1/23/2018 11:56:23 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:24 AM['2018-01-23T03:56:24Z']:waiting emqttd
1/23/2018 11:56:25 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:26 AM['2018-01-23T03:56:26Z']:waiting emqttd
1/23/2018 11:56:26 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:27 AM['2018-01-23T03:56:27Z']:waiting emqttd
1/23/2018 11:56:28 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:29 AM['2018-01-23T03:56:29Z']:waiting emqttd
1/23/2018 11:56:29 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:30 AM['2018-01-23T03:56:30Z']:waiting emqttd
1/23/2018 11:56:30 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:31 AM['2018-01-23T03:56:31Z']:waiting emqttd
1/23/2018 11:56:32 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:33 AM['2018-01-23T03:56:33Z']:waiting emqttd
1/23/2018 11:56:33 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:34 AM['2018-01-23T03:56:34Z']:waiting emqttd
1/23/2018 11:56:35 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:36 AM['2018-01-23T03:56:36Z']:waiting emqttd
1/23/2018 11:56:36 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:37 AM['2018-01-23T03:56:37Z']:waiting emqttd
1/23/2018 11:56:38 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:39 AM['2018-01-23T03:56:39Z']:waiting emqttd
1/23/2018 11:56:39 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:40 AM['2018-01-23T03:56:40Z']:waiting emqttd
1/23/2018 11:56:41 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:42 AM['2018-01-23T03:56:42Z']:waiting emqttd
1/23/2018 11:56:42 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:43 AM['2018-01-23T03:56:43Z']:waiting emqttd
1/23/2018 11:56:44 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:45 AM['2018-01-23T03:56:45Z']:waiting emqttd
1/23/2018 11:56:45 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:46 AM['2018-01-23T03:56:46Z']:waiting emqttd
1/23/2018 11:56:47 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:48 AM['2018-01-23T03:56:48Z']:waiting emqttd
1/23/2018 11:56:48 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:49 AM['2018-01-23T03:56:49Z']:waiting emqttd
1/23/2018 11:56:50 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:51 AM['2018-01-23T03:56:51Z']:waiting emqttd
1/23/2018 11:56:51 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:52 AM['2018-01-23T03:56:52Z']:waiting emqttd
1/23/2018 11:56:53 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:54 AM['2018-01-23T03:56:54Z']:waiting emqttd
1/23/2018 11:56:54 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:55 AM['2018-01-23T03:56:55Z']:waiting emqttd
1/23/2018 11:56:56 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:57 AM['2018-01-23T03:56:57Z']:waiting emqttd
1/23/2018 11:56:58 AMNode '[email protected]' not responding to pings.
1/23/2018 11:56:59 AM['2018-01-23T03:56:59Z']:waiting emqttd
1/23/2018 11:56:59 AMNode '[email protected]' not responding to pings.
1/23/2018 11:57:00 AM['2018-01-23T03:57:00Z']:waiting emqttd
1/23/2018 11:57:01 AMNode '[email protected]' not responding to pings.
1/23/2018 11:57:02 AM['2018-01-23T03:57:02Z']:waiting emqttd
1/23/2018 11:57:02 AMNode '[email protected]' not responding to pings.
1/23/2018 11:57:03 AM['2018-01-23T03:57:03Z']:waiting emqttd
1/23/2018 11:57:04 AMNode '[email protected]' not responding to pings.
1/23/2018 11:57:05 AM['2018-01-23T03:57:05Z']:waiting emqttd
1/23/2018 11:57:05 AM['2018-01-23T03:57:05Z']:timeout error

JacksonJia avatar Jan 23 '18 03:01 JacksonJia

I'm having the same issue. Did you guys figure out a fix?

thaspius avatar Feb 27 '18 12:02 thaspius

When I deployed to GCP using Kubernetes, it worked fine. But I never was able to deploy it with Nomad.

MadDataScience avatar Feb 27 '18 16:02 MadDataScience

I had this issue caused by setting of EMQ_HOST which overrides the default using IP address. I used EMQ_HOST=cluster5 and I had the same 'not responding to pings' issue. I sat the EMQ_WAIT_TIME to high number so I had time to run shell on container. Got:

/opt/emqttd # emqttd_ctl status Node emq@cluster5 not responding to pings.

=ERROR REPORT==== 3-Mar-2018::21:46:50 === ** System running to use fully qualified hostnames ** ** Hostname cluster5 is illegal ** /opt/emqttd #

So my naming 'cluster5' failed. I changed to 'stack.cluster5' and it works fine. A FQDN needs at least one dot in the name.

Not sure if same as reported, but this helped me.

tbstenbock avatar Mar 03 '18 22:03 tbstenbock

@tbstenbock The node name should be emq@<FQDN>, the short host name like 'cluster5' cannot work.

emqplus avatar Mar 04 '18 02:03 emqplus

I got the same issue when deploy to OSX & AWS. Sometime it work, sometime fail.

I try to debug

  1. to make EMQ_WAIT_TIME=150000000, run docker-compose up
  2. when i see log : '......not responding to pings.' . Access to container.
  3. cat etc/emq.conf, make sure config is fine.
  4. Run ./bin/emqttd start.
  5. It is work...

Question is : Why /opt/emqttd/bin/emqttd foreground & command in the start.sh not working ? I am sure that start.sh had execute when docker-compose up . And ENV variables has already set.

jhaoheng avatar Mar 04 '18 13:03 jhaoheng

Any update on this?

smoomrik avatar Mar 13 '18 13:03 smoomrik

I guess this should have been fixed in v2.3.11 which starts in daemon mode with a built-in wait. Before 2.3.11 it is fork-started in foreground mode --- it may fail if it boots up too slow.

New variable environment to opt for slow container is WAIT_FOR_ERLANG (defaults to 15)

Appreciated if some one can give the new version a try.

spring2maz avatar Sep 03 '18 18:09 spring2maz

hi @spring2maz i've tried your suggestion and it didn't work... 😢

ricardoccpaiva avatar Sep 24 '18 13:09 ricardoccpaiva

Hi @ricardoccpaiva which version do you use? I'd like to take a look at the logs. Could you try docker run -it image:version sh then inside the container, execute /opt/emqx/start.sh finally you should be able to inspect the logs in /opt/emqx/log dir.

spring2maz avatar Sep 24 '18 16:09 spring2maz

hi @spring2maz turns out that it works now, did the following change and i'm still on 2.3.9

screen shot 2018-09-28 at 13 50 00

ricardoccpaiva avatar Sep 28 '18 12:09 ricardoccpaiva

docker run --network host The env HOSTNAME in docker container has no mapper in /etc/hosts.

xuluqiang avatar Mar 14 '19 01:03 xuluqiang

@ricardoccpaiva That's not a good change, since you're just ignoring the problem by daemonising the server. Also, with that change, the comment on line 141 is now wrong.

haf avatar Mar 23 '19 15:03 haf

I solved this with the following for our k8s deployment:

  • make sure to use the k8s discovery
  • use 3.1-beta.x
  • for hosted k8s on GCP, use api server at http://kubernetes.default:8080
  • I used stateful sets, but that is probably not needed if there's no consensus mechanism in emqx
  • a shared cookie value across the nodes

haf avatar Mar 23 '19 15:03 haf