emqx-bridge-mqtt icon indicating copy to clipboard operation
emqx-bridge-mqtt copied to clipboard

Bridge connection loops messages back forever

Open mspoehr opened this issue 4 years ago • 16 comments

I am using the emqx-bridge-mqtt plugin to bridge EMQX to an AWS IoT endpoint. Occasionally (seemly randomly, on emqx start) the connection will start spamming the same messages over and over via the bridge connection until the service is restarted again. It appears that this issue occurs roughly 25% of the time when emqx starts up.

I am using emqx version 4.0.5, with this plugin configured to be loaded on startup (via /var/lib/emqx/loaded_plugins) on Ubuntu Linux 18.04.

Below is an excerpt from the log when this issue occurs.

2020-04-01 13:43:22.253 [warning] <<"someclientid">>@127.0.0.1:40844 [Session] Dropped msg due to mqueue is full: Message(Id=^@^E¢>7Û^ÝôB^@^@^F#Uñ, QoS=1, Topic=aws/some/topic/structure, From=bridge, Flags=[], Headers=)
...
2020-04-01 13:43:22.253 [error] [Bridge] Can't be found from the inflight:45091

Those messages can be seen repeatedly with different identifiers and topics.

The following is the emqx_bridge_mqtt.conf being used:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

You may notice I am bridging both to and from cloud/# on the bridge connection. I would expect a single loopback of all bridged messages if any clients subscribe locally - and this does occur the 75% of the time where emqx is not spamming messages. Could this be causing the issue the other 25% of the time? Any config recommendations or is this a bug with emqx?

mspoehr avatar Apr 02 '20 20:04 mspoehr

We've changed our IoT rules to also accept messages on a topic structure separate from the one being subscribed to. This is the resulting config:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

With the config in the previous comment I expected a single loopback 100% of the time, but instead got infinite loopback some percentage of the time. With this new config, I don't expect any loopback, ever. I'm still seeing the same issue with infinite loopback. This tells me that the issue does not have anything to do with attempting to send and receive from the same topic structure, as sending to some/topic/structure/to-aws and subscribing to some/topic/structure/cloud should be completely disjoint.

I was able to restart emqx a (seemingly) random number of times to get the issue to go away.

Any thoughts on other config options that could be causing this?

mspoehr avatar Apr 13 '20 20:04 mspoehr

There is a problem with your configuration, causing the message to be sent in a loop

bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure

bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1

The emqx bridged messages will be sent to AWS IoT via some/topic/structure/cloud/# topic

You configured again

bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
`` `
Subscribe to some/topic/structure/cloud/# in AWS IoT, so the message will loop

turtleDeng avatar Apr 16 '20 01:04 turtleDeng

Thanks for the response. Changing the configuration so that there isn't a loop, I still see this exact same issue. Ideally I'd be able to send/receive to the bridge on the same topic structure, but it isn't a deal breaker if this isn't possible.

My config now contains:

bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#

Messages should be sent to AWS on some/topic/structure/to-aws, and received from the subscription some/topic/structure/cloud. With this new config, I still see the same issue.

I was able to find some more information while debugging as well:

  • I found that sending several messages in quick succession reliably reproduces this issue.
    • In bash, using mosquitto-clients: for i in $(seq 10); do mosquitto_pub -t to-aws/test -m "{ \"content\": \"$i\" }"; done
    • Sending < 5 messages quickly seems to never reproduce the issue.
    • Sending 5-10 messages quickly seems to only sometimes reproduce the issue.
    • Sending 10+ messages quickly almost always reproduces the issue (and sending 100+ reproduces 100% of the time)
  • The quality of service of published messages appears not to matter.
  • I did a test with mosquitto in place of AWS IoT, and emqx appeared to work just fine with mosquitto.
    • I tried with both a secure and insecure connection from emqx to mosquitto, with the secure connection trying to replicate as closely as possible how we connect to IoT.
    • Sending 1000's of messages when connected to mosquitto as fast as possible did not cause the looping issue.
    • Subscribing with the third client directly to the mosquitto broker correctly receives those 1000 messages, then no more.

Thus, the issue is not looping so much as sending too many messages quickly with AWS IoT causes some sort of bad state.

mspoehr avatar Apr 17 '20 14:04 mspoehr

Maybe it is retransmission.

          qos1 +-------+                 qos2 +-------+                 qos3
Publisher ---> | Node1 | --Bridge Forward---> | Node2 | --Bridge Forward---> Subscriber
               +-------+                      +-------+
  • qos1: The quality of messages from Publisher to Node1
  • qos2: The quality of messages from Node1 to Node2 with bridge It's value is '1'. Message will be retransmited when ack package is not received or timeout.
  • qos3: The quality of messages from Node2 to Publisher

qingchuwudi avatar Apr 21 '20 08:04 qingchuwudi

I had initially thought the same. I'm not sure that we know definitively that qos2 is '1'. Since my latest config has the publish/subscribe topics completely disjoint, the '1' qos for subscribed topics should not effect which QoS published messages are sent out as.

In my bash example above, mosquitto_pub defaults to sending messages with QoS 0. Therefore, I would expect that both qos1 and qos2 is '0'.

I'm not sure what qos3 was during my testing. I would like to say that I tested with both 0 and 1, but I'm not 100% sure about that.

mspoehr avatar Apr 23 '20 17:04 mspoehr

You can refer to https://docs.emqx.io/broker/latest/en/configuration/configuration.html#zoneexternalupgradeqos

turtleDeng avatar Apr 29 '20 09:04 turtleDeng

I am also having the same problem. I have AWS IOT as broker and emqx bridge is to bridge devices using MQTT-SN protocol to send data to this emqx bridge. The same data comes back on each publish.

I have to have MQTT based devices which are sending data to AWS IOT direectly which should reach to MQTT-SN based devices running behind emqx bridge.

saumilsdk avatar May 08 '20 11:05 saumilsdk

@mspoehr or @turtleDeng can you please help in resolving looping in case brigde is subscribing same topics as publishing? I am connecting bridge to AWS IOT endpoint.

saumilsdk avatar May 11 '20 11:05 saumilsdk

You can refer to https://docs.emqx.io/broker/latest/en/configuration/configuration.html#zoneexternalupgradeqos

I really don't think this is a QoS issue. This issue occurs when using any combination of QoS values, even with all 0's, which should never cause this.

Can you please help in resolving looping in case brigde is subscribing same topics as publishing?

@saumilsdk I am not sure that this is possible with emqx in its current state. This issue seems like a bug in emqx to me. In my case, I was able to configure my publishing and bridge subscriptions to be completely disjoint, and I still received the same messages looped back forever.

If you're not experiencing the messages being looped back forever, but instead just receiving the same message you publish one time-I would actually expect this behavior.

mspoehr avatar May 12 '20 22:05 mspoehr

@mspoehr Hi i agree with you if i get the same message twice but here I am stuck with looping forever and ended up restarting server every time. I can find no way out of this issue. Any help will be appreciated. Here is my bridge config. I am using EMQX-SN plugin to act as gateway and EMQX-BRIDGE to bridge the gateway to end AWS IOT broker.

@qingchuwudi and @turtleDeng If you guys can also look into this.

bridge.mqtt.emqx2.start_type = auto

bridge.mqtt.emqx2.address = a3itfXXXX.iot.us-east-1.amazonaws.com:8883

bridge.mqtt.emqx2.proto_ver = mqttv4

bridge.mqtt.emqx2.clientid = bridge_emqx2

bridge.mqtt.emqx2.clean_start = true

bridge.mqtt.emqx2.ssl = on

bridge.mqtt.emqx2.cacertfile = /etc/mqtt/certs/rootCA.pem

bridge.mqtt.emqx2.certfile = /etc/mqtt/certs/cert.crt

bridge.mqtt.emqx2.keyfile = /etc/mqtt/certs/private.key

bridge.mqtt.emqx2.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384

PSK-AES128-CBC-SHA,PSK-AES256-CBC-SHA,PSK-3DES-EDE-CBC-SHA,PSK-RC4-SHA

bridge.mqtt.emqx2.keepalive = 60s

bridge.mqtt.emqx2.tls_versions = tlsv1.2,tlsv1.1,tlsv1

bridge.mqtt.emqx2.forwards = #

bridge.mqtt.emqx2.subscription.1.topic = #

bridge.mqtt.emqx2.subscription.1.qos = 1

bridge.mqtt.emqx2.reconnect_interval = 30s

bridge.mqtt.emqx2.retry_interval = 20s

bridge.mqtt.emqx2.max_inflight_size = 32

saumilsdk avatar May 13 '20 04:05 saumilsdk

@ saumilsdk I'm not sure if your use case will work with this, but you could try adding a receive_mountpoint just to see if it helps. In my case, I had:

bridge.mqtt.aws.receive_mountpoint = aws/

^ but this still didn't fix the issue for me. I could see in your case where emqx could loop back infinitely if you are bridging # in both directions with no prefixes on either side. Still, for a "bridge" plugin, it seems like this should be a supported use case. But it seems that it is not.

mspoehr avatar May 13 '20 15:05 mspoehr

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/

saumilsdk avatar May 15 '20 04:05 saumilsdk

Did you have solution for this issue? I'm also facing this same issue with bridge

gbunel29 avatar May 27 '20 14:05 gbunel29

@gbunel29 i have moved from emqx to paho mqtt-sn gateway which doesn't have loopback issue. @mspoehr FYI

saumilsdk avatar May 28 '20 05:05 saumilsdk

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/

——— It will verb loop when publish topic same as subscribe topic。Suggest you change your topic such:

  • pub:/pub/etc/...
  • sub:/sub/etc/…

Maybe add prefix or suffix will avoid this problem .

wwhai avatar Jun 05 '20 06:06 wwhai

This looping error is occurring again. Message ar looping forever when published on same topic.

Trance-Paradox avatar Mar 09 '22 18:03 Trance-Paradox