vernemq VerneMQ Cluster Bytes Dropped increasing

VerneMQ Version: 1.11.0
OS: OpenShift 4.3
Cluster size/standalone: 1 discovery +2 master nodes. OpenShift based LB with TLS passthrough.
VerneMQ configuration (vernemq.conf) set by environment variables:

Environment: DOCKER_VERNEMQ_ACCEPT_EULA: yes DOCKER_VERNEMQ_DISCOVERY_NODE: IP1.xxx.xxx.xxx DOCKER_VERNEMQ_ALLOW_ANONYMOUS: off DOCKER_VERNEMQ_ALLOW_MULTIPLE_SESSIONS: on DOCKER_VERNEMQ_LISTENER.ssl.default: 0.0.0.0:8883 DOCKER_VERNEMQ_LISTENER.ssl.cafile: /etc/tls/cacerts.pem DOCKER_VERNEMQ_LISTENER.ssl.keyfile: /etc/tls/key.pem DOCKER_VERNEMQ_LISTENER.ssl.certfile: /etc/tls/cert.pem DOCKER_VERNEMQ_PLUGINS.vmq_bridge: on DOCKER_VERNEMQ_PLUGINS.vmq_passwd: off DOCKER_VERNEMQ_PLUGINS.vmq_acl: off DOCKER_VERNEMQ_PLUGINS.vmq_diversity: on DOCKER_VERNEMQ_VMQ_DIVERSITY.auth_postgres.enabled: on DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.host: psql-vernemq-authdb.svc.cluster.local DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.port: 5432 DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.user: vernemq DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.password: *** DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.database: vernemq_db DOCKER_VERNEMQ_VMQ_DIVERSITY.postgres.password_hash_method: crypt DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.max_outgoing_buffered_messages: 1000 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.cafile: /etc/tls-clients/mq-broker.pem DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.insecure: on DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0: mq-broker.dev:443 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.client_id: mqttbridge DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.username: mqttbridge DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.password: *** DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.topic.1: $share/group/XXX out 2 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.topic.2: $share/group/YYY out 2 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.topic.3: $share/group/ZZZ out 2 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.topic.4: $share/group/AAA/* out 2 DOCKER_VERNEMQ_VMQ_BRIDGE.ssl.br0.topic.5: $share/group/BBB/* out 2 CLUSTER_RESTART: xxxxxxx DOCKER_VERNEMQ_OUTGOING_CLUSTERING_BUFFER_SIZE: 1500000 DOCKER_VERNEMQ_METADATA_PLUGIN: vmq_swc DOCKER_VERNEMQ_LOG.console.level: debug

Expected behaviour

there are no messages dropped between VerneMQ cluster nodes running in one OpenShift cluster

Actual behaviour

we encounter data loss, messages are dropped during intra-cluster replication, so some of the subscribers do not receive them
cluster_bytes_dropped is increasing once every couple days
there are no indications on the reason of message drops. no increased cpu or ram usage, not fluctuation of new clients, increased number of messages etc.

Log agenda: Master 1: vernemq-masters-24-hszpp IP1 Master 2: vernemq-masters-24-wk4hx IP2 Discovery: vernemq-discovery-node-27-l4hnq IP3

the following sequence of event can be observed in logs on Master 1 node:

December 5th 2020, 12:21:33.542 | 11:21:33.542 [warning] can't publish to remote node 'VerneMQ@IP2' due to 'msg_dropped' | vernemq-masters-24-hszpp

December 5th 2020, 12:21:33.541 | 11:21:33.538 [warning] can't publish to remote node 'VerneMQ@IP2' due to 'msg_dropped' | vernemq-masters-24-hszpp

December 5th 2020, 12:21:33.307 | 11:21:33.307 [warning] can't send 229687 bytes to 'VerneMQ@IP2' due to timeout, reconnect! | vernemq-masters-24-hszpp

logs from Master 2 node before the timeout event:

December 5th 2020, 12:21:33.188 | 11:21:33.179 [debug] Replica meta7: AE exchange with 'VerneMQ@IP3', nothing to synchronize | vernemq-masters-24-wk4hx December 5th 2020, 12:21:33.188 | 11:21:33.179 [debug] Replica meta7: AE exchange with 'VerneMQ@IP3' terminates with reason normal in state local_sync_repair | vernemq-masters-24-wk4hx

logs from discovery node before timeout event:

December 5th 2020, 12:21:33.153 | 11:21:33.153 [debug] Replica meta3: AE exchange with 'VerneMQ@IP2', nothing to synchronize | vernemq-discovery-node-27-l4hnq December 5th 2020, 12:21:33.153 | 11:21:33.153 [debug] Replica meta3: AE exchange with 'VerneMQ@IP2' terminates with reason normal in state local_sync_repair

Dec 09 '20 08:12 tomaszwolek

@tomaszwolek thanks, currently I know of at least 1 other cluster on Docker/Kubernetes with the same question. I'm trying to collect more information. It happens on the level of internode MQTT data distribution. One node forwards the data over a dedicated TCP connection, and when the TCP send encounters a timeout, the originating node will do an immediate close-reconnect.

How does this typically resolve (if it does resolve?). Any more observations you might have? what is the payload size of a typical MQTT message in your case?

Dec 09 '20 08:12 ioolkos

@tomaszwolek forgot to add: the anti-entropy (AE) exchanges you are seing are unrelated here. This is just logging the AE chatter.

Dec 09 '20 08:12 ioolkos

note to self (as similar issues): https://github.com/vernemq/vernemq/issues/1468 https://github.com/vernemq/vernemq/issues/944 especially: https://github.com/vernemq/vernemq/issues/944#issuecomment-438630842 which seems to indicate that the socket can block.

Dec 09 '20 08:12 ioolkos

@ioolkos The issues resolves after 1 second: December 5th 2020, 12:21:34.538 | 11:21:34.538 [info] successfully connected to cluster node '[email protected]' again - no further information in the log, which may indicate TCP layer as you said

MQTT message payload size is random, most of the messages are below 1k, but some of them as big as 100k. As you see the config I've tried to play with the cluster buffering and increase it to 1MB, but it didn't change anything.

I've added the AE exchanges because there is literally nothing else in the logs on other nodes when the drops happen.

Dec 09 '20 10:12 tomaszwolek

@tomaszwolek I've merged a PR to make the data cluster com tcp buffers more configurable. In case you're interested, please read through the PR comment: https://github.com/vernemq/vernemq/pull/1692

Dec 14 '20 09:12 ioolkos

@ioolkos thanks, i'm interested in testing this. do you advise to increase the incoming buffer size on all nodes? is there any metric which we can monitor the incoming/outgoing buffer usage?

Dec 15 '20 08:12 tomaszwolek

Jep, on all nodes.

Note that there's currently no recommendation on what values are "best". But the new configs should enable better experimentation. The consideration should be that the RTT on the internode path is probably very low. Based on that and the bandwidth we need for message delivery (over 1 TCP connection per direction currently) and based on the busy_port errors we see, we can start experimenting for adapted values.

Dec 15 '20 08:12 ioolkos

@ioolkos Which release is the configurable TCP buffer available on? 0.11.0 release in October 2020 and the PR here was merged in Dec 2020.

Jan 13 '21 16:01 kushwiz

@kushwiz only in master yet, not tagged production release package.

Jan 13 '21 16:01 ioolkos

With the suggested watermark values from the PR, i was able to get a 3-node cluster working without bytes dropping off. Thanks

Jul 23 '21 03:07 kushwiz

@kushwiz thanks for your feedback!

:point_right: Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq

Jul 23 '21 06:07 ioolkos

@ioolkos Well, it works most of the time. But there are few drops and I cant seem to get it to stop dropping.

[warning] can't send 1467 bytes to '[email protected]' due to timeout, reconnect!

If it can't send it, Can it hold it in buffer and send it after reconnect?

Aug 05 '21 18:08 kushwiz

vernemq vernemq copied to clipboard

VerneMQ Cluster Bytes Dropped increasing

Expected behaviour

Actual behaviour

vernemq
vernemq copied to clipboard